Non Inferiority vs Non significant

Lawrence_Lynn · December 11, 2024, 5:14pm

I am trying to understand the conclusion of this PettyBone RCT published in JAMA

Here Medpage Today summarizes the results as a potential guideline changing breakthrough for using PCT (an expensive biomarker) to guide antibiotic duration.

“Daily assessment of procalcitonin (PCT), rather than standard care alone, led to patients spending significantly less time on antibiotics cumulatively in the first 28 days (9.8 days vs 10.7 days, P =0.01) while meeting noninferiority criteria for all-cause mortality (20.9% vs 19.4% at 28 days).

Ahmed_Sayed · December 11, 2024, 11:06pm

I think the issue would be much simpler to understand if we reframe it in terms of estimation than hypothesis testing (https://jamanetwork.com/journals/jama/fullarticle/2813846) and viewing the stated CIs as “compatibility intervals”. “Compatibility” here loosely means “This range of treatment effect estimates is consistent with our data”. There are more detailed discussions one could get into re: the philosophical implications of this definition but we’ll sidestep those for now.

Essentially, the study estimates 2 treatment effects antibiotic use and mortality.

1. For antibiotic use, the trial’s results are compatible with anything from a 0.19 to 1.58 reduction (days) in the duration of antibiotic use with a PCT-guided strategy. The investigators initially hoped (hypothesized) that this compatibility range would exclude 0, which it indeed does. Therefore, they conclude that use of PCT-guided management of Antibiotic use is superior to standard management, insofar as reducing antibiotic use is concerned.

2. For 28-day mortality, the trial’s results are compatible with anything from a 2.18 percentage point decrease to a 5.32 percentage point increase in 28-day mortality with a PCT-guided strategy. The investigators initially hoped (hypothesized) that this compatibility range would exclude a 5.4 percentage point increase in 28-day mortality which it indeed does. Therefore, they conclude that PCT-guided management is “non-inferior” to standard of care, insofar as mortality is concerned.

The only difference between the first and second points is that the value I had hoped to exclude was 0 in the first and 5.4% in the second. Thus, non-inferiority basically boils down to saying “The value I want my compatibility interval to exclude is something other than 0”.

Where does the particular 5.4% threshold come from? It basically demarcates the “unacceptable” increase in risk which would be too much to tolerate from the perspective of the stakeholder (e.g., patient/physician/society). The rationale goes that:

A) As long as we can prove that our intervention improves something (e.g., lessens antibiotic use) by any amount (anything more than 0)

and

B) Does not increase the risk of some unfavorable outcome by an unacceptable amount (in this case, anything exceeding a 5.4% increase in 28-day mortality is considered unacceptable, and anything below that is acceptable)

Then statements like: “Care guided by measurement of PCT reduces antibiotic duration safely compared with standard care” can be made (where “safely” is demonstrated by the second point above).

This could all be analyzed in a Bayesian fashion to frame it in terms of posterior probabilities (which would, I think, be more intuitive to understand). In which case it would go something like: “We want to ensure a >95% probability of reducing antibiotic use by any amount” and a “<5% probability of increasing mortality by 5.4% or more” (for example).

Lawrence_Lynn · December 11, 2024, 11:40pm

Thank you very much for this in depth review.

So the pretest selection of non inferiority AND a positive RCT is based on accepting a 5.4 absolute increase in mortality and a ~20% relative increase for any reduction in antibiotic duration above zero?

Is that the right predefined state or am I misunderstanding what you have said?

Ahmed_Sayed · December 11, 2024, 11:57pm

Yes, although the pre-specified acceptable relative increase in mortality was actually closer to 36% (as per section 6.1 of the Supplement, the investigators had originally anticipated a baseline mortality of 15% and a pre-defined non-inferiority margin of 5.4%, which would constitute a 36% relative increase).

f2harrell · December 12, 2024, 11:40am

As you said above, a Bayesian statement would be direct and simple.

Regarding statements like “treatment B is compatible with an effect in the range [L, U]” isn’t it more appropriate to say “the data on treatment B are compatible with …”?

Ahmed_Sayed · December 12, 2024, 3:38pm

Thanks for pointing that out! Yes, that’d be a more accurate statement based on my understanding. Edited now for clarity.

Lawrence_Lynn · December 13, 2024, 9:25am

So to be clear, it seems you are saying the authors pre specified that any non-zero (positive) reduction in antibiotic duration to be “superior” (a positive study) and up to a 36% relative increase in mortality to be “non-inferior”?

So that the conclusion of the authors, based on this pre-specification, was that because the data were consistent with about a 9% relative reduction in antibiotic duration, this is superior, while despite data consistent with about a 20% relative increase in mortality, this is non-inferior.

Is that true or am I misunderstanding something?

Ahmed_Sayed · December 13, 2024, 3:57pm

Yes (unless both of us are misunderstanding), though I think the the observed relative increase in mortality was not 20% (20.9/19.4 = 8% relative increase).

Also, the range of values spanned by the interval (0.19 to 1.58 days reduction and 2.18 percentage points decrease to 5.32 percentage points increase for antibiotic use and mortality difference, respectively) are more important than the point estimate itself (since what is needed to declare superiority or non-inferiority is the exclusion of certain values from the interval, rather than having the point estimate be at a specific value).

Lawrence_Lynn · December 13, 2024, 5:19pm

Yes. Not sure where I calculated 20. I think I was calculating relative of relative!

I think that physicians would be confused by the use of a preselected non inferior here rather than using a statistically significant increase in mortality.

Here I believe p was .02 for mortality and p was .01 for reduced antibiotic duration.

Why does this difference render mortality non inferior and antibiotic duration superior.

What is the non-inferior state based on, statistically.

Ahmed_Sayed · December 14, 2024, 12:10am

The non-inferiority p is calculated on almost the exact same basis as the superiority p, with 3 differences being that:

Instead of the the null hypothesis being that the PCT-guided strategy has no effect (that is, absolute difference = 0), the value under the null hypothesis is instead your non-inferiority margin (that is, absolute difference = 5.4% increase in mortality).
And so just as the superiority design aims to reject the null hypothesis of no effect (absolute difference = 0) if P < something, the non-inferiority design aims to reject the null hypothesis of inferiority (absolute difference = 5.4%) with a P < something
An additional difference is that P-values for superiority tests are often two-sided (we don’t know whether the treatment is beneficial or harmful) whereas those for non-inferiority are often one-sided (we want to make sure the treatment is not worse than a pre-specified level of harm). There’s a good argument to make that superiority should also use a one-sided P-value, but that’s another discussion.
The something needed to declare non-inferiority is sometimes set at 0.025 for non-inferiority (rather than the typical 0.05 for superiority studies).