Advice/Opinion on (reviewing - ethics) an RCT analysis plan

I’ll try and be brief while not leaving out key information.

Setting: I am a general biostatistician (epi/clinical research background, ~12 years experience) and am the biostatistical reviewer on our hospital’s Scientific Advisory Sub-Committee for the ethics committee (>3 years in role).
Issue: A submission is proving challenging as I believe it to be very under baked, not a bad research question, just the protocol is sloppy and very light on details. I am pushing back, and want to continue to push back.
Desired response: I’m not looking to have my back slapped or to rant in an echo chamber, I’m just looking for any feedback from neutral experienced parties as to the quality/appropriateness of the proposed plan.
Context: We are a larger tertiary paediatric hospital serving >2 mil, involved in many international multi-centre trials etc. I think that gives needed perspective to the operational and research environment.

Thank you so much for your time. More information can be provided if requested.

Participants: Patients undergoing tonsillectomy.
Outcome variable: Pain score, measured multiple times a day, on days 0 through 7 post operation
Intervention: Four arms randomised - Standard Care (SC, program of basic pain relief), SC + placebo, SC + Trt1, SC + Trt2.
Blinding: Assignment of SC is known to clinician/patient, other 3 arms are blinded to all

Primary outcome: “The primary outcome will be a reduction in self-reported pain on swallowing at breakfast and dinner in the seven days post-tonsillectomy between the patients receiving honey vs placebo vs standard treatment alone.”

Submission 1:

Had the following blurb under “Sample size”, no subtitle for statistical analysis.

The data will be analysed using a linear statistical model (that is, multiple regression model) with response as the average pain score. Assuming a modest model correlation coefficient of 0.3, power 90% and significance level 0.05 gives a sample size of 400 (100 per treatment group) is sufficient.

Submission 2 (following receiving comments from myself/others):

Subtitle updated to “Sample size and analysis”

The data will be analysed using a repeated measures model with pain as response. A difference in pain score of 1 is taken as clinically significant. Note that no analytic methods exist for power calculations in such complex models, so simulations need to be conducted.

The data for the treatment groups were simulated based on mean difference of 1 in pain scores deemed clinically significant. It was also expected that for all treatment groups the average pain scores will decrease over time. Based on 500 simulations, the power is calculated as the proportion of times the null hypothesis is rejected, based on a repeated measures model for the pain scores.

The simulations were repeated with several different parameter values, and a sample size of 85 per group was found to be sufficient. Simulations give a power of 0.9 for a sample size of 85 per group. 15 patients are added per group due to loss in follow-up, cancellation of surgery, change of surgical plan and other protocol violations.

My issues:

  • If I was handed this protocol and a dataset, I really wouldn’t know how to proceed
    • be it from a handling the multiple measures per day side of things, or
    • the model specification side of things, or
    • the key comparisons to run, or
    • the statistics to report (clear ideas spring to mind*)
  • The outcome ‘pain’ is measured multiple times per day, exactly when is described differently in different parts of the protocol, a very related but different issue really.
    • “Pain Scale Revised at rest and with swallowing twice daily” is that 3 measurements or 4?
    • “at rest and with swallowing in the morning just before breakfast and then during breakfast, and in the evening just before evening meal and then during evening meal” is that 5 measurements now? no pre evening meal rest measure?
  • One of the secondary aims refers to “no difference between arms Trt1 and Trt2” when I think they should be defined by equivalence or non-inferiority as approprite

*the whole point here is that ‘springing to mind’ shouldn’t be necessary, right?

So, what do you think?

1 Like

Just to touch on one small part of the study, it is highly preferred to analyze pain severity using an ordinal regression model (e.g., proportional odds model) adjusted for baseline severity (using indicator levels for all levels but one to allow for a complex relationship with baseline). And never subtract pain severities. The difference in two ordinal variables is no longer ordinal unless the variables are interval scaled. For the repeated measurements a mixed effects ordinal model may be in order.

For the proportional odds model, power is computed in terms of odds ratios, as in the R Hmisc package’s popower and posamsize functions.

1 Like