Interpretation of INTERACT-2 Study in Intracerebral Hemorrhage

yetanotherpatel · June 15, 2020, 4:46pm

I wanted to elicit perspectives on the interpretation of a randomized trial that has been much discussed in the neurocritical care community: INTERACT-2.

While this trial is several years old, I was recently summarizing itfor a lecture and was interested in what the datamethods community thought about a few points. Some thoughts I will raise have already been discussed in previous threads, but I wanted to make sure I’m interpreting things okay.

Clinical background: there is an acute hypertensive response that is commonly appreciated after a patient experiences an acute onset of intracerebral hemorrhage (ICH). This has been variably associated with an increased risk of hematoma expansion, development of perihematomal edema, and increased risk for death and disability.

Two major randomized trials, INTERACT-2, published in 2013 and ATACH-2 published in 2016 have both examined if lowering blood pressure within several hours of symptom onset would lead to decreased death and disability at 3 months as measured by the ordinal modified Rankin Scale (mRS, range 0-6, lower scores = less disability, 6 = death). I will leave out discussion on ATACH-2 for now to keep this post relatively short.

INTERACT-2 randomized patients within 6 hours of symptom onset to two different blood pressure goals: < 180 mm Hg and < 140 mm Hg, to be maintained for 7 days post-injury. The primary outcome was a dichotimization of the mRS to “good outcome” of 0-3 vs. “bad outcome” of 4-6 at 3 months. For this outcome, there was an odds ratio of 0.87 (95% CI 0.75–1.01) in favor of the intensive lowering group (< 140). There was also a similar odds ratio when using a secondary ordinal analysis on the entire mRS (0.87 95% CI 0.77–1.00).

The community’s interpretation of this was variable. In general, here were the main comments:

On one hand, clinicians thought that given the 95% CI, there was a strong likelihood of benefit in intensive blood pressure lowering to < 140 mm Hg, especially with the lack of any data showing harm. Proponents also pointed to the likely benefit as appreciated in the secondary outcomes including the ordinal analysis as well as the quality of life data.
Others pointed to the p value being > 0.05, and the lack of any data showing a difference in hematoma expansion to claim that there was no difference in either blood pressure goal. They also pointed to the lack of blinding that could have biased the results.

My questions/thought about this study:

Overall, there is a lot of variation (and resulting controversy) in the use of the modified Rankin scale as an outcome measure in neurocritical care. The two main points of contention seem to be what to use as a cut-off for a good outcome; commonly this is 0-2 vs 3-6 or 0-3 vs 4-6. Some studies have also used 0-4 vs 5-6. Some trials have also used an ordinal regression, but mostly as a secondary outcome. While I assume that most of the datamethods community would advocate for the use of an ordinal regression, this seems to be particularly controversial on Twitter. I think a lot of this has to do with the suspicion that trialists are trying to gain statistical power in any way they can in an effort to show “positive” results. I also think that us physicians are used to analyzing dichotomous, whether that is good or bad. Any tips on how to translate the results of an ordinal regression to the general medical community?
Given that the odds ratios of both the dichotomous outcome and ordinal regression were similar, doesn’t that imply that the proportional odds assumption is valid, making the ordinal regression a valid way to interpet the outcome?
Critics of ordinal regression have pointed to the lack of inter-rater reliability in measuring the mRS, thus negating any potential increased power that may be gained by using this method vs. dichotomizing the results. How would you reconcile these seemingly opposing viewpoints?

Hopefully my questions make sense. I’m happy to provide any clarification or further clinical background. Thank you and looking forward to discussing further.

R_cubed · June 15, 2020, 8:07pm

This is a great question, and I understand all too well the challenges of examining the stroke literature.

I will do my best to link to some useful citations and threads here that suggest an answer to some of your questions, and hopefully the others with more expertise can elaborate.

As for your question:

Blockquote
While I assume that most of the datamethods community would advocate for the use of an ordinal regression, this seems to be particularly controversial on Twitter.

Aside from the authority Dr. Harrell (our host), who recommends ordinal logistic regression as a widely applicable technique, the following citations also mention it as one of the preferred ways of examining stroke data.

Can We Improve the Statistical Analysis of Stroke Trials? Statistical Reanalysis of Functional Outcomes in Stroke Trials: The Optimising Analysis of Stroke Trials (OAST) Collaboration
https://www.ahajournals.org/doi/10.1161/strokeaha.106.474080

From the abstract:

Blockquote
Conclusions— When analyzing functional outcome from stroke trials, statistical tests which use the original ordered data are more efficient and more likely to yield reliable results. Suitable approaches included ordinal logistic regression, t test, and robust ranks test.

Statistical Analysis of the Primary Outcome in Acute Stroke Trials
https://www.ahajournals.org/doi/full/10.1161/STROKEAHA.111.641456

Blockquote
Common outcome scales in acute stroke trials are ordered categorical or pseudocontinuous in structure but most have been analyzed as binary measures. The use of fixed dichotomous analysis of ordered categorical outcomes after stroke (such as the modified Rankin Scale) is rarely the most statistically efficient approach and usually requires a larger sample size to demonstrate efficacy than other approaches.

We can eliminate the other methods of inference (t-tests, etc.) as the data from these scales is inherently ordinal. For example:

These ordinal clinical scales require more sophisticated techniques to analyze. The problems Dr. Harrell mentions in the depression literature that uses the Ham-D is also a problem with the neurological outcome literature (as cited in the Missing Medians) article. In this thread, he describes the use of ordinal proportional odds logistic model, along with nonlinear smoothing methods, to extract information that would be missed by other methods.

Using ordinal regression eliminates the meaningless disputes about “cut offs.” Any particular cut off is arbitrary. See the discussion in Chapter 18 of Biostatistics for Biomedical Research (aka BBR (pdf)). That is a gold mine of wisdom regarding applied statistics. Pay particular attention to the discussion on the information loss by dichotomization of continuous variables. Information loss implies a loss of power, and an effective reduction in the sample size.

I forgot to add: Mixed effects ordinal regression is also recommended as a widely applicable technique for the meta-analyses of individual patient data, or aggregate data summaries.

I doubt I would have ever found this technique if I had not learned of the proportional odds model from BBR and Regression Modelling Strategies (RMS) first.

I’d ask the critics of ordinal regression, how they would aggregate the results from studies that use different cut points into a meta-analysis without access to individual patient data.

There is no coherent method to do so. The different cut points lead to heterogeneous data sets, and the best you could do, being extremely charitable, is a vote count, taking for granted that the direction of the outcome was accurately reported.

If we are to do useful, economical, and ethical research for conditions like ICH, there should be no doubt that the proportional odds ordinal logistic model, within a broader, Bayesian decision theoretic framework is the perspective to take. If we place RCTs into a broader, formal decision making context, more informative yet economical experiments can be designed that answer the questions front line clinicians really want answered.

f2harrell · June 15, 2020, 8:08pm

I’m really glad you brought up this topic. Yes the similarity in ORs gives evidence in the direction of proportional odds being OK. But even if PO was not very OK it can still be a better way to analyze it.

I don’t understand why the use of the ordinal scale is controversial in the least. It is the use of dichotomization of the scale that completely disrespects the way the Rankin scale was developed. Many neurologists act as if they want power to decrease. Help me out here …

It would be good to see Bayesian re-analysis of the studies. I think they provide fairly good evidence of benefit.

yetanotherpatel · June 15, 2020, 8:33pm

This is only my personal interpretation, but it seems that some of the distrust from ordinal regression originates from IST-3, which had originally listed its primary outcome as a dichotomozation of the mRS (I forget the actual cutoff). However, they ended up using an ordinal regression in a secondary analysis, and used that to state that in the conclusions that the trial was “positive.”

After this trial, which was published in 2012, many skeptics have dismissed any use of ordinal regression in the stroke literature as “statistical nonsense.” This has likely been exacerbated by few stroke/neurocritical care trials pre-specifying the use of ordinal regression in the primary outcome. I think the PATCH study has pre-specified an ordinal regression of the mRS as its primary outcome, but this did not receive as much criticism likely because the trial found that the main intervention (platelets) was harmful. My understanding of using an ordinal regression is that it would increase the power to detect any difference in outcome, regardless of benefit or harm, no? So it is interesting that this trial did not receive the same statistical criticism that others like IST-3 and INTERACT-2 did.

Another factor that seems to fuel this skepticism is that most critical care trials use a dichotomous outcome of mortality, so when there is another outcome used, it is hard for some to accept. I have heard/read several times how mortality is considered more “objective” when compared to a “subjective” disability scale such as the mRS, but as a neurointensivist I would counter by stating that in patients with neurological injury, death is not the only clinically meaningful outcome and the degree of disability is certainly important as it affects quality of life.

I can certainly see how varying uses of the mRS, whether by specifying different cutoffs in a dichotomous fashion or with an ordinal regression, would cause some to be skeptical and make them suspicious of attempting to “change the goalposts” to fit one’s preconceived notion. But I also understand how this can be frustrating to stroke researchers, who are likely trying to find the best way to look for benefit.