Drawing inferences from frequentist point estimates with wide confidence intervals

Dear community,

Recently, the ICU-ROX trial reported its findings. This trial was an RCT randomising 1000 patients between conservative and usual oxygen in intensive care units. A post-hoc subgroup analysis investigating septic patients within this cohort has now reported.

I have some reservations about the inferential descriptions used. I will summarise the key findings below and highlight the concerns I have.

Study Design
1000 patient enrolled into ICU-ROX (21 centres in AUS / NZ).
Intension to treat (965 patients available)
251 meet diagnostic criteria for sepsis
130 assigned to conservative oxygen
121 assigned to usual care
(Note, for the suspected treatment effect size, this is a tiny sample)

Separation between the two groups (in terms of oxygenation) is debatable, but a topic of conversation for another day.

Primary outcome measure:
90-day mortality (Fisher’s exact test)
conservative oxygen mortality: 47/130 patients (36.2%)
usual oxygen mortality: 35/120 patients (29.2%)
absolute difference; 7% (95% CI [-4.6%, 18.6%]; odds ratio 1.38; 95% CI 0.81–2.34; P = 0.24

Additionally compared “survival times using log-rank tests and present these as KaplanMeier curves and used a Cox proportional hazards model to calculate hazard ratios for survival”

The authors go on to say: “No statistically significant differences by treatment group were observed in any of the specified mortality end points. However, point estimates of treatment effect on mortality rates were higher in patients allocated to conservative oxygen at each time point.” and “There were no statistically significant differences between treatment groups for other secondary end points. However, point estimates of treatment effect consistently favored usual oxygen”

My concern is with the use of frequentist point estimates in this way. The authors bring up the point estimates throughout the discussion, framing the discussion toward evidence of harm from usual oxygen care. This would be contrary to a number of other studies published in this field, including the HYPER2S study, that was stopped early at a planned interim analysis, due to harm in the high oxygen group.

Could I ask what the community thinks about glossing over the confidence intervals, and using the point estimates to draw narrative?

Many thanks in advance for your insights.

Even though compatibility (aka confidence) intervals are almost impossible to interpret correctly, it is imperative in such a setting that they be given the most emphasis, if the study used a frequentist design. For mortality the data are consistent with a supposition that the mortality odds are reduced by a factor of 0.81 as well as raised by a factor of 2.34.

I don’t know how to really interpret the findings without putting a skeptical prior on the odds ratio (favoring 1.0 with equal chance of benefit as for harm) and computing

  • P(OR > 1 | data, prior)
  • P(1/1.05 < OR < 1.05 | data, prior)

The latter is the probability of similarity (if you accept a fold change of 1.05 in odds of mortality) and will undoubtedly be in the highly uncertain range, providing direct quantification of “we don’t know enough from this study.”

Many thanks @f2harrell this chimes with my thoughts and my previous interpretation of your course notes. Much appreciated.