A randomized trial of epinephrine in out-of-hospital cardiac arrest

f2harrell · July 21, 2018, 9:00pm

GD Perkins et al presented an extremely well designed and conducted double blind randomized trial of epinephrine vs. placebo, randomizing over 8,000 patients. The primary outcome was the probability of survival at 30 days. Secondary outcomes included the probability of survival until hospital discharge with a score of 3 or less on the modified Rankin scale (which ranges from 0 [no symptoms] to 6 [death]). The statistical analysis was state-of-the-art, including an ordinal analysis with the proportional odds ordinal logistic model, and Bayesian analysis. The conclusion was that epinephrine increased the chance of survival but among survivors, there was a tendency for worse neurological outcomes. Reaction on twitter (see also here) has been interesting, with some clinicians emphasizing the Rankin score outcomes on survivors. It’s always tricky to interpret conditional analyses, and as one tweet said, there is a relationship between brain damage and risk of death.

A shortcoming of the original analysis in my view is that (1) it’s unclear exactly how the proportional odds analysis was done, and (2) it is not clear that the authors ever performed the preferred ordinal analysis that did not group any outcome levels. A key analysis in the paper combined the bottom 4 levels of the Rankin scale.
Failure to distinguish these categories is not a great idea. Grouping loses information and power. An excellent re-analysis by Matthew Shun-Shin considered full 7-level ordinal analyses with various re-arrangements of the Rankin categories to assume that some neurocognitive outcomes are worse than death. A gold-standard analysis would elicit utilities for all the outcome states from relevant persons and test whether epinephrine increases the expected utility. Short of that, ordinal analyses are better than binary analyses as demonstrated here and here.

Here is an analysis that uses all 7 categories in their original order, using R.

a <- c(rep(0,15), rep(1,10), rep(2,29), rep(3,20), rep(4,8), rep(5,8), rep(6,3904))
b <- c(rep(0,12), rep(1,17), rep(2,23), rep(3,35), rep(4,12), rep(5,27), rep(6,3881))
x <- c(rep('placebo', length(a)), rep('epinephrine', length(b)))
y <- c(a, b)

require(rms)
f <- lrm(y ~ x)
f

Logistic Regression Model
 
 lrm(formula = y ~ x)
 
 Frequencies of Responses
 
    0    1    2    3    4    5    6 
   27   27   52   55   20   35 7785 
 
                      Model Likelihood     Discrimination    Rank Discrim.    
                         Ratio Test           Indexes           Indexes       
 Obs          8001    LR chi2      5.88    R2       0.003    C       0.541    
 max |deriv| 6e-12    d.f.            1    g        0.168    Dxy     0.082    
                      Pr(> chi2) 0.0153    gr       1.183    gamma   0.165    
                                           gp       0.002    tau-a   0.004    
                                           Brier    0.013                     
 
           Coef   S.E.   Wald Z Pr(>|Z|)
 y>=1      5.5342 0.2014 27.47  <0.0001 
 y>=2      4.8376 0.1485 32.57  <0.0001 
 y>=3      4.1565 0.1139 36.48  <0.0001 
 y>=4      3.7315 0.0988 37.77  <0.0001 
 y>=5      3.6118 0.0953 37.90  <0.0001 
 y>=6      3.4304 0.0905 37.90  <0.0001 
 x=placebo 0.3368 0.1398  2.41  0.0160  

summary(f, x='placebo')

             Effects              Response : y 

 Factor                  Low High Diff. Effect   S.E.    Lower 0.95 Upper 0.95
 x - epinephrine:placebo 2   1    NA    -0.33675 0.13985 -0.61085   -0.06266  
  Odds Ratio             2   1    NA     0.71408      NA  0.54289    0.93926

The 2-sided (why?) p-value is 0.015 in favor of epi, and the odds epi:placebo OR is 0.71. This provides evidence that patients getting epi tended to have better outcomes on the 7-point scale than those randomized to placebo.

Along the lines of Shun-Shin, let’s assume that modified Rankin scale level 5 is worse than death, and get a new 7-level ordinal analysis:

y2 <- ifelse(y == 5, 7, y)
f <- lrm(y2 ~ x)
f

Logistic Regression Model
 
 lrm(formula = y2 ~ x)
 
  Frequencies of Responses
 
    0    1    2    3    4    6    7 
   27   27   52   55   20 7785   35 
 
                      Model Likelihood     Discrimination    Rank Discrim.    
                         Ratio Test           Indexes           Indexes       
 Obs          8001    LR chi2      0.02    R2       0.000    C       0.502    
 max |deriv| 2e-07    d.f.            1    g        0.010    Dxy     0.005    
                      Pr(> chi2) 0.8843    gr       1.010    gamma   0.010    
                                           gp       0.000    tau-a   0.000    
                                           Brier    0.013                     
 
           Coef    S.E.   Wald Z Pr(>|Z|)
 y>=1       5.6982 0.2049  27.81 <0.0001 
 y>=2       5.0016 0.1532  32.64 <0.0001 
 y>=3       4.3206 0.1200  36.01 <0.0001 
 y>=4       3.8957 0.1057  36.85 <0.0001 
 y>=6       3.7760 0.1025  36.86 <0.0001 
 y>=7      -5.4176 0.1827 -29.66 <0.0001 
 x=placebo -0.0201 0.1379  -0.15 0.8843  

summary(f, x='placebo')

             Effects              Response : y2 

 Factor                  Low High Diff. Effect  S.E.    Lower 0.95 Upper 0.95
 x - epinephrine:placebo 2   1    NA    0.02007 0.13795 -0.25030   0.29044   
  Odds Ratio             2   1    NA    1.02030      NA  0.77857   1.33700

Now we don’t have evidence for benefit of epi (p=0.88) but we also do not have evidence for non-benefit of epi, since the confidence interval on the odds ratio is wide.

Interpretations of clinical trials with nontrivial outcomes are always nuanced!

Note that the above analyses were unadjusted for baseline covariates, due to non-availability of the raw data. Adjusted analyses are more appropriate.

Conclusions

With an ordinal outcome, frequentist statistical power and limiting effective sample size are largely determined by the total of the frequencies of the non-dominant outcome categories. Unless Rankin level 5 is counted as more favorable to patients than death, and absent a full patient utility analysis, the study’s sample size was insufficient for drawing firm conclusions. There were not enough survivors.

What Would a Bayesian Design Do Differently?

Frequentist designs invite fixed sample sizes, and sample size computation requires knowledge that is not available during study planning. With a Bayesian approach, sampling can continue until a target (efficacy, harm, or futility) is reached, with no penalty for multiple looks. Studies that ended equivocally in the frequentist paradigm can readily be extended in the Bayesian paradigm, subject to resource limitations. Bayesian analysis can also provide some advantages. For example, one can compute the posterior probability that epinephrine reduces mortality by some small, but nonzero, amount.

Other Analyses

G Howard et al provide an exact randomization Wilcoxon test for analyzing all levels of the Rankin score. This is more computationally involved than the proportional odds model and does not allow for covariate adjustment.

Anupam Singh provides an assessment of the proportional odds assumption for this study. The proportional odds assumption is always violated, so one needs to ask whether the weighted average odds ratio arising from the PO model is worse than other overall treatment summaries that may be computed. To quote Stephen Senn:

Clearly, the dependence of the proportional odds model on the assumption of proportionality can be overstressed. Suppose that two different statisticians would cut the same three-point scale at different cut points. It is hard to see how anybody who could accept either dichotomy could object to the compromise answer produced by the proportional odds model.

PaulBrownPhD · July 23, 2018, 9:24pm

i quite like this open access paper that compared methods for joint analysis of survival and functional decline in ALS.

f2harrell · July 23, 2018, 10:35pm

Nice. I think that is especially useful when follow-up for mortality is long-term.

simongates · July 24, 2018, 5:37am

Thanks for this - really interesting and useful. The comment about alternative Bayesian designs is interesting, as it is something we are working on at the moment, so hopefully there will be some interesting results from that soon.

[declaration of interest: I’m one of the study investigators and co-authors]

KristianBrock · July 24, 2018, 6:35am

I was planning on getting a student to conduct a joint analysis of survival and repeated measures ratings scale data in ALS - I cannot believe it has already been done! Back to the drawing board. Thanks for raising it.

arthur_albuquerque · August 8, 2022, 10:46pm

What a fascinating thread. I fully agree with your analysis using the cumulative logistic regression model, Dr. Harrell.

I believe that the following statement is helpful, but we can get much more insightful outputs.

Hence I fitted a Bayesian proportional odds cumulative logistic regression and estimated the risk difference (%) between epinephrine and control at each mRS score. In this table, I present the median and 95% CrI, along with multiple posterior probabilities.

I have a hard time interpreting a regular cumulative odds ratio. Thus I believe that providing absolute difference is better when interpreting this type of data. In this case, specifically, we can see how the mortality difference (score == 6) is much different from other scores.

These results also highlight the importance of complementing outputs in the relative scale (OR) with absolute differences (ARR).

Full code and thorough data analysis can be found here: ordinal PARAMEDICS2 · GitHub

f2harrell · August 9, 2022, 11:44am

This is a super way to translate the results to a more clinical scale. Would you please clarify “at each mRS score”? I’m thinking the table needs another dimension, to show risk difference of Y \geq y for y=1, \ldots, 6 and for baseline Y over all possible levels that occurred at baseline. Absolute risk differences must be covariate-specific.

Note that the rmsb package’s brlm function uses priors for the intercepts that probably work a little better than priors used by brms.

arthur_albuquerque · August 9, 2022, 5:29pm

Glad you enjoyed it!

I estimated predictions in the response scale (proportion) for both treatment groups with brms::posterior_epred() (I actually applied tidybayes::add_fitted_epred() to facilitate data wrangle, which uses posterior_epred() under the hood). This allowed me to generate predictions for each treatment for each mRS score, similar to what is shown in the graph below as y_{rep} (Please note that there is a 95% CrI around each y_{rep} point estimate, but one can’t see it because they are narrow, and the Y-axis scale didn’t help either).

I then estimated the absolute risk difference between groups for each mRS score.

I am afraid I don’t have the knowledge on how to estimate that with brms… Would you please help me?

In this case, we don’t have access to the IPD to do that. Would you not present ARRs for datasets such as this one?

I just looked into the documentation. Is it Normal(0, 100^2)? brms uses a Student t(3, 0, 2.5) distribution as default.

f2harrell · August 10, 2022, 12:09am

Sorry I don’t have time to study this in detail but I still need a little more information about “at each mRS” score. Is mRS the dependent variable or is it an adjustment covariate?

arthur_albuquerque · August 10, 2022, 1:31am

Dependent! mRS = modified Rankin scale

simongates · September 13, 2022, 10:05am

This is so good! Thank you for doing it (it’s a thing I’ve been thinking about but not actually managed to find time for).

Are you interested in developing it a bit more to use the individual patient data and include covariates? As a PARAMEDIC-2 investigator I think I can get access to the trial data but obviously approvals are needed. Would make a great example of how to analyse/present these sorts of outcomes.

arthur_albuquerque · September 15, 2022, 7:53pm

Thank you very much. Please check your DM inbox.

arthur_albuquerque · September 27, 2022, 2:17am

Dr. Harrell, what did you mean with “relevant persons”?

f2harrell · September 27, 2022, 11:10am

Good question and I’d like to know where the literature stands on this. Typically utilities are elicited from affected patients but I’ve been involved in studies where they are elicited from family members of patients. In other cases, randomly chosen volunteers are used. In some cases diseased patients are not the best choices. For example severe burn patients can accommodate to their injuries and have increasing utilities for severe outcome states that most people who avoid such injuries would be surprised to learn.