Interpretation of the REGEN-COV trial

In the report of the trial comparing two doses of REGEN-COV vs. placebo, in which the composite of 29 days covid-19 related hospitalization or death was the primary outcome, the following statement summarised the primary outcome results:
“Covid-19–related hospitalization or death from any cause occurred in 18 of 1355 patients in the REGEN-COV 2400-mg group (1.3%) and in 62 of 1341 patients in the placebo group who underwent randomization concurrently (4.6%) (relative risk reduction [1 minus the relative risk], 71.3%; 95% confidence interval [CI], 51.7 to 82.9; P<0.001); these outcomes occurred in 7 of 736 patients in the REGEN-COV 1200-mg group (1.0%) and in 24 of 748 patients in the placebo group who underwent randomization concurrently (3.2%) (relative risk reduction, 70.4%; 95% CI, 31.6 to 87.1; P=0.002)”

And the authors conclude:
“The data from this phase 3 trial involving outpatients with Covid-19 showed that the 1200-mg dose of REGEN-COV, like the 2400-mg dose, reduced the risk of Covid-19–related hospitalization or death and sped the time to recovery.”

I find the results and conclusions misleading, since five patients died during the trial, 1 in the 2400mg group ,1 In the 1200 mg group and 3 In the placebo group, so that the experienced primary outcome was almost totally COVID related hospitalization and not death.

Would like to know what others think of these results and the authors interpretation.


Agree completely. It appears that the “positive” trial result is driven virtually entirely by the hospitalization component of the primary endpoint, since almost no deaths were recorded. So to imply that this drug is reducing hospitalization or death seems more than a little questionable. I’ll be interested to hear opinions from statisticians on the framing of these results.

1 Like

happy to see a solid example illustrating the problem with composites ie the likely lack of persuasive power they carry. there is no guideline dictating that the results must specify what proportion ‘failed’ on each component of the composite, to gleen what is driving the result, although guideines say components should be analysed separately as secondary analyses. composites are so prominant, their problems should be emphasised

edit: imagine if we suggested adding ED vists to the composite, is it ok? i could walk into the ED right now. What about other subjective endpoints? what about biomarkers? what about composites of 10 components, who can make sense of that? but there’s little discussion about this and little adverse effect on the prominence of composites as primary outcomes

1 Like

I don’t understand why composite outcomes lack persuasiveness. Staying out of the hospital is extremely important to patients. The primary analysis should have, however, counted death as worse than hospitalization in an ordinal response analysis. WIth the observed slightly favorable mortality result (though based on small numbers as stated above) the result might have been slightly more “significant” while still being driving by hospitalization.

Even breaking the ties in hospitalization/death the three-level ordinal variable is a low-information low-power outcome. More ties need to be broken by bringing in severity and duration of symptoms and need for organ support.

1 Like

Reducing the risk of hospitalization is certainly important to patients. But given that there are multiple companies coming out with various monoclonal antibody cocktails, all vying for a piece of this (unfortunately) very lucrative pie, it seems important to remain clear-eyed about which products do what.

Here’s another recent NEJM publication from July, 2021, looking at a different monoclonal antibody cocktail (from a different manufacturer).

By day 29, none of the patients who had received this antibody cocktail had died, whereas 10 of the patients in the placebo arm had died.

On initial review, it looks like the primary endpoint for this July/21 trial was similar to the REGEN-COV trial linked in the initial post. Here’s the conclusion in the July study’s abstract:

“Among high-risk ambulatory patients, bamlanivimab plus etesevimab led to a lower incidence of Covid-19–related hospitalization and death than did placebo and accelerated the decline in the SARS-CoV-2 viral load.”

In contrast, here’s the conclusion in the abstract of the REGEN-COV trial (where it looks like there were 2 deaths among REGEN-COV-treated patients and 4 deaths among REGEN-COV-treated patients- from Supplement Table 7):

“REGEN-COV reduced the risk of Covid-19–related hospitalization or death from any cause, and it resolved symptoms and reduced the SARS-CoV-2 viral load more rapidly than placebo.”

Note the use of “AND” versus “OR” when referring to the effect on death in the framing of the results of these two trials.

If I were about to prescribe a very expensive therapy to decrease the likelihood that a high risk patient with COVID would die, and had a choice between several of these products, should I be confident that any, some, or all of these cocktails would decrease my patient’s risk of death ?

Worded inexpertly:

When death is included as part of a composite primary endpoint and the trial’s “win” seems to be driven by non-fatal components of the endpoint, how “lopsided” does a small number of deaths have to be before we can make reliable inferences about the drug’s effect on mortality? Does the answer depend on some type of sensitivity analysis to gauge the extent to which deaths are driving the overall result ?


When death is included as part of a composite endpoint, is there a minimum number of deaths that should be recorded before a company can justifiably claim that its product reduces the risk of death?”


i think anything that’s perceived to have ‘fine print’ is unpersuasive. i see the clinicians and pharmacols trying to mentally take the composite apart immediately. if in this case the death count was in the supp doc it’s encouraging cynicism, warranted or not. my personal feeling it loses cogency when endpts are blended simply because it’s opaque “there’s some effect somewhere on this amalgamation of things”

I thought of that also, that it is not enough to have hospitalization as an outcome. was oxygenation needed? were he/she put on mechanical ventilation? ECMO?

1 Like

As detailed here, we need to be move to a formal analysis of this. Using the longitudinal ordinal models discussed in that link in a Bayesian framework you can compute the probability that the treatment affects mortality to the same extent that it affects other endpoints. I did that calculation in the VIOLET Vitamin-D study reanalysis.

It depends on what you mean by blended. We need to treat the components differently (by ranking them or using a patient utility assessment) and not just union the occurrence of various outcomes.


I deleted the part of my initial post where I said I couldn’t find the number of deaths in the main body of the REGEN-COV study- it is indeed in there (my oversight). Also, in the discussion (but not the abstract), the authors note the limitations posed by the small number of deaths:

“The small number of deaths limited the ability to assess the effects of REGEN-COV on mortality.”

But if they say this in the discussion, why does the wording in the abstract state that their product decreases the risk of death? Using the word “OR” versus “AND” feels a bit slippery.

1 Like

Though only solving 7/10ths of that problem, a proper break-the-ties ordinal analysis result could be stated as this: The estimated probability of hospitalization or death for treatment A is 0.xx and for treatment B is 0.yy. (This comes from \Pr(Y \geq 1) when Y=0,1,2 for OK, hospitalized, dead. Then add the limitation that this probability was driven by the relatively large number of hospitalizations.

ok but the makeup is susceptible to the timepoint chosen for the analysis with the distribution across the utility assessment changing over time - to such an extent that you could Q whether an interim analysis is testing the same hypothesis as that tested at study completion, that’s just my sense of it, ie an interim analysis might produce a positive result but someone will ask what events make up the composite, few of them will be at the upper end of the scale because it’s preliminary, then they will ask if the analysis done too soon to truly impact opinion

edit: on this point, Janet Wittes spoke at the pharm stat conference this year and i recall her saying “beware of 7 point scales where only 5 points are used”, i never heard her elaborate on this, but maybe it’s an allusion to the same point i’m trying to make

Perhaps it should be made explicit that a “composite endpoint” is not the same as an ordinal outcome measure. From this link, a “composite endpoint” is:

A “composite endpoint” is when researchers in a clinical trial decide to combine several measurable outcomes into a single result.

An ordinal endpoint places greater weight on certain outcomes, so the “moving the goalposts” critique seems misplaced.

In rational world, readers of the study could decide if the outcomes were large enough in a particular case, based on a contextual assessment of patient utility along with data from prognostic factors.

Link to VIOLET-2 analysis and an intro to Markov ordinal models.

1 Like

For some studies, rapid decisions (possibly leading to dropping a treatment arm or some more exotic adaption) are needed and there is no hope of making a decent decision on the basis of a low information endpoint. So a maximum information endpoint with lots of ordered categories may be needed for interim decision making, whereas at the end of the study you might emphasis both that composite endpoint and one most important element of it.

It would be helpful to nail down the nomenclature. I often use the general term multiple endpoints to cover ordinal and polytomous outcomes and sometimes to cover the binary union of a bunch of endpoints. To “combine into a single unit” is not quite clear to me.

Paul can you give an example of that to make sure I understand what you were referring to?

i dont like using the following reference because i think their result is trivial and easily discerned by anyone who has coded eg a win-ratio, but they concluded: “the weights of the new effect measure considerably depend on the censoring distribution and not on the priority of the individual endpoints. … For this reason, the combined univariate measure proposed by Buyse [4] and Pocock [5] can be very difficult to interpret.” - and the censoring of course depends on the analysis timepoint, in my notes i have given the reference as stats in medicine 2014 rauch et al., perhaps it’s this:

in any case, i tried to illustrate it in figure 2 here: you shift the time window for mortality and readmission and you gather more of these events and the make up of the composite shifts and along with it the power. Not sure to what extent this applies to your ordinal severity scale, it seems less vulnerable to the argument i guess

1 Like

The dependence on censoring is completely a consequence of not respecting the data generating process when formulating an estimator, e.g., not using the rawest form of the outcome data. Measures derived from longitudinal outcomes will not have that problem.

but are we having to cross thresholds when differentiating events on severity? eg if the severity of symptoms is mild or moderate then compare duration, and if duration is tied (within some margin) then move to the next differentiating factor. if so, then it’s susceptible to these thresholds, analogous to censoring, ie some piece of data is exploited to the extent that these thresholds are permeable, if that makes sense (it depends a lot on the particular context). there are additional practical problems that are not emphasised eg the code that is teasing apart ties in the way you describe must be lengthy and intricate. i would worry about coding errors, id want to know that it’s very carefully validated. And i still see the temptation by nonstatisticians to take the outcome apart in the results meeting, and once it’s disassembled and naked it’s less persuasive, if mortality isnt driving the result then introducing severity of symptoms as a distinguishing factor will seem dubious, dont you think? not to us, but to others, they are waiting for an excuse to be sceptical

I can see how this skepticism is justified. using the same logic someone can have a composite end point for side effects of SARS COV2 vaccines that is composed of muscle pain OR death, and conclude that vaccines were found to increase muscle pain and death and have it in the abstract and press release.

1 Like

The beauty of modeling the raw data instead of modeling derived quantities such as time to specific events is that you don’t have to make time vs. severity tradeoffs. Ordinal longitudinal analysis asks this question: What is the severity of the worst thing that happened to the patient on a given day, as a function of time and treatment? This also takes into account another facet. If events include clinical events and hospitalization, a clinical event that requires a prolonged hospitalization would get more weight than an event that resulted in a short hospitalization. (Unless it is a permanent event).

The sample size often does not allow components of an overall outcome measure to be picked apart. If a sponsor or regular really wants to see this (e.g., showing that treatment benefit can stand on mortality alone) they need to put their money where their mouth is.


didnt you have some figure displays for the ordinal longitudinal outcome to illustrate how the results could be presented? maybe im misremembering because I checked both and and cannt find it. i think that’s a really valuable piece, to persuade the nonstatisticians. Maybe severity was colour coded … i can’t remember