Non Inferiority vs Non significant

I am trying to understand the conclusion of this PettyBone RCT published in JAMA

Here Medpage Today summarizes the results as a potential guideline changing breakthrough for using PCT (an expensive biomarker) to guide antibiotic duration.

“Daily assessment of procalcitonin (PCT), rather than standard care alone, led to patients spending significantly less time on antibiotics cumulatively in the first 28 days (9.8 days vs 10.7 days, P =0.01) while meeting noninferiority criteria for all-cause mortality (20.9% vs 19.4% at 28 days).

2 Likes

I think the issue would be much simpler to understand if we reframe it in terms of estimation than hypothesis testing (https://jamanetwork.com/journals/jama/fullarticle/2813846) and viewing the stated CIs as “compatibility intervals”. “Compatibility” here loosely means “This range of treatment effect estimates is consistent with our data”. There are more detailed discussions one could get into re: the philosophical implications of this definition but we’ll sidestep those for now.

Essentially, the study estimates 2 treatment effects antibiotic use and mortality.

1. For antibiotic use, the trial’s results are compatible with anything from a 0.19 to 1.58 reduction (days) in the duration of antibiotic use with a PCT-guided strategy. The investigators initially hoped (hypothesized) that this compatibility range would exclude 0, which it indeed does. Therefore, they conclude that use of PCT-guided management of Antibiotic use is superior to standard management, insofar as reducing antibiotic use is concerned.

2. For 28-day mortality, the trial’s results are compatible with anything from a 2.18 percentage point decrease to a 5.32 percentage point increase in 28-day mortality with a PCT-guided strategy. The investigators initially hoped (hypothesized) that this compatibility range would exclude a 5.4 percentage point increase in 28-day mortality which it indeed does. Therefore, they conclude that PCT-guided management is “non-inferior” to standard of care, insofar as mortality is concerned.

The only difference between the first and second points is that the value I had hoped to exclude was 0 in the first and 5.4% in the second. Thus, non-inferiority basically boils down to saying “The value I want my compatibility interval to exclude is something other than 0”.

Where does the particular 5.4% threshold come from? It basically demarcates the “unacceptable” increase in risk which would be too much to tolerate from the perspective of the stakeholder (e.g., patient/physician/society). The rationale goes that:

A) As long as we can prove that our intervention improves something (e.g., lessens antibiotic use) by any amount (anything more than 0)

and

B) Does not increase the risk of some unfavorable outcome by an unacceptable amount (in this case, anything exceeding a 5.4% increase in 28-day mortality is considered unacceptable, and anything below that is acceptable)

Then statements like: “Care guided by measurement of PCT reduces antibiotic duration safely compared with standard care” can be made (where “safely” is demonstrated by the second point above).

This could all be analyzed in a Bayesian fashion to frame it in terms of posterior probabilities (which would, I think, be more intuitive to understand). In which case it would go something like: “We want to ensure a >95% probability of reducing antibiotic use by any amount” and a “<5% probability of increasing mortality by 5.4% or more” (for example).

5 Likes

Thank you very much for this in depth review.

So the pretest selection of non inferiority AND a positive RCT is based on accepting a 5.4 absolute increase in mortality and a ~20% relative increase for any reduction in antibiotic duration above zero?

Is that the right predefined state or am I misunderstanding what you have said?

Yes, although the pre-specified acceptable relative increase in mortality was actually closer to 36% (as per section 6.1 of the Supplement, the investigators had originally anticipated a baseline mortality of 15% and a pre-defined non-inferiority margin of 5.4%, which would constitute a 36% relative increase).

1 Like

As you said above, a Bayesian statement would be direct and simple.

Regarding statements like “treatment B is compatible with an effect in the range [L, U]” isn’t it more appropriate to say “the data on treatment B are compatible with …”?

2 Likes

Thanks for pointing that out! Yes, that’d be a more accurate statement based on my understanding. Edited now for clarity.

1 Like

So to be clear, it seems you are saying the authors pre specified that any non-zero (positive) reduction in antibiotic duration to be “superior” (a positive study) and up to a 36% relative increase in mortality to be “non-inferior”?

So that the conclusion of the authors, based on this pre-specification, was that because the data were consistent with about a 9% relative reduction in antibiotic duration, this is superior, while despite data consistent with about a 20% relative increase in mortality, this is non-inferior.

Is that true or am I misunderstanding something?

1 Like

Yes (unless both of us are misunderstanding), though I think the the observed relative increase in mortality was not 20% (20.9/19.4 = 8% relative increase).

Also, the range of values spanned by the interval (0.19 to 1.58 days reduction and 2.18 percentage points decrease to 5.32 percentage points increase for antibiotic use and mortality difference, respectively) are more important than the point estimate itself (since what is needed to declare superiority or non-inferiority is the exclusion of certain values from the interval, rather than having the point estimate be at a specific value).

1 Like

Yes. Not sure where I calculated 20. I think I was calculating relative of relative!

I think that physicians would be confused by the use of a preselected non inferior here rather than using a statistically significant increase in mortality.

Here I believe p was .02 for mortality and p was .01 for reduced antibiotic duration.

Why does this difference render mortality non inferior and antibiotic duration superior.

What is the non-inferior state based on, statistically.

The non-inferiority p is calculated on almost the exact same basis as the superiority p, with 3 differences being that:

  1. Instead of the the null hypothesis being that the PCT-guided strategy has no effect (that is, absolute difference = 0), the value under the null hypothesis is instead your non-inferiority margin (that is, absolute difference = 5.4% increase in mortality).
    And so just as the superiority design aims to reject the null hypothesis of no effect (absolute difference = 0) if P < something, the non-inferiority design aims to reject the null hypothesis of inferiority (absolute difference = 5.4%) with a P < something

  2. An additional difference is that P-values for superiority tests are often two-sided (we don’t know whether the treatment is beneficial or harmful) whereas those for non-inferiority are often one-sided (we want to make sure the treatment is not worse than a pre-specified level of harm). There’s a good argument to make that superiority should also use a one-sided P-value, but that’s another discussion.

  3. The something needed to declare non-inferiority is sometimes set at 0.025 for non-inferiority (rather than the typical 0.05 for superiority studies).

3 Likes

Thank you. IMHO such preselection creates a clinically misleading result.

Most clinicians would consider a 8% relative increase in mortality to be more important than a 9% relative decrease in antibiotic duration.

Clinicians know that certain organisms such as Staph aureus require long antibiotic treatment and may recur if antibiotics are stopped too early.

This study appears to show the opposite of their conclusion.

2 Likes

In the Bayesian world you state the conditions for recommending usage of the treatment, and compute the probability that the conditions are met. For example P(mortality increase < 0.02 and non-fatal outcome improved by at least 0.03).

Thanks I wanted to understand the basis for saying this research was a breakthrough when I think it is not a positive study. Is my assessment incorrect here?

Antibiotics reduce death. Bacteria take time to kill and may return if not effectively eliminated. Shortening antibiotic duration is desirable but may increase mortality and require readmission and a restart of antibiotics.

Any biomarker might reduce antibiotic use but the question is whether that increases mortally, and restarting of antibiotics.

This study appears to show the antibiotic reduction came at the cost of higher mortality therefore the biomarker is not reliable measurement for this purpose. Is this true?

Yet, based on these statistics, here you see this study identified in the Press as a breakthrough and reversal of previous RCT.

“Derek Angus, MD, MPH, of University of Pittsburgh Medical Center, said there were surprises in ADAPT-Sepsis: “I would have guessed CRP and PCT would have behaved similarly,” he told MedPage Today . “Given prior [randomized controlled trials] of PCT in the ICU [intensive care unit] population, I might have guessed there would be no benefit compared to usual care.”

This comment made me think about the fact that the PettyBone RCT is further removed from the Bayesian world than a conventional RCT because the condition (the disease) is not defined in the PettyBone world.

For example, if the RCT is investigating group A streptococcus bacteremia then the condition can be stated and the priors defined.

When it is a PettyBone bucket of different infections then the condition cannot be defined. We are to far away from any relevant pretest determination.

In the Bayesian world the condition is pivotal to the pretest state so Bayes, even more than Hill, requires a specifically definable condition under test.

With the PettyBone shortcut there is no reasonably specific definable condition or disease. Instead there is only a “synthetic syndrome”, a set of different diseases captured by a triage set of thresholds the mix of which, and therefore the pretest state of which, changes with each RCT.

Is this a correct assessment? How would a synthetic syndrome be studied in the Bayesian world?

Interesting retrospective study to consider showing the potential survival dependence on antibiotic dosing and why shortened treatment guided by PCT might be hazardous.

Lots of confounding. Need expert to provide input re kinetics.
@davidcnorrismd

https://academic.oup.com/jacamr/article/6/6/dlae201/7926198

image

“I would have guessed CRP and PCT would have behaved similarly"
They did! an increase of 1.57% (95% CI, −2.18 to 5.32) and one of 1.69% (95% CI, −2.07 to 5.45) are pretty close. The main issue is that one falls on one side of an arbitrary threshold and the other doesn’t.

2 Likes

I should also declare an interest here - I was involved in the early stages of the trial at the top of the discussion (up to the grant application), but left the institution that ran it before recruitment started. I argued for a Bayesian approach but at that time (2016-2017) I don’t think so many clinicians were on board with it, and it was seen as “risky” from a funding perspective, as likely to be unfamiliar to members of grant boards. So the group didn’t want to go down that road.

How would this have been designed as a Bayesian study.

Speaking as a clinician with a clinician’s level knowledge of Bayes. (ie common use of Bayesian thinking in clinical decision making but not familiar with how a study like this would be designed by that methodology)