Guidelines for Covariate Adjustment in RCTs

This practice makes me want to pull my hair out. This statistical analysis strategy makes no sense. Instead pre-specify the most powerful analysis as the primary analysis, i.e., the covariate adjusted Cox model. Ironically this also helps with the proportional hazards assumption. It’s common for unadjusted survival curves to be less in a proportional hazards relationship because each treatment’s survival curve represents a mixture of survival curves.

3 Likes

I know of several other examples in literature.
The RADIANT-2 RCT exploring everolimus versus placebo in neuroendocrine tumors was published in 2011.
The study was declared “negative” despite the fact that the curves diverged. Neuroendocrine tumors are by definition very heterogeneous (biologically and clinically) but they used an unstratified log-rank test (P=0.052):

The following year, the team sent the post hoc multivariable analysis to the ASCO meeting, where they found a completely different result (P=0.003).

The authors seemed to attribute the discrepancy to imbalances during randomization.

Although the mTOR inhibition is effective, it was necessary to wait 4 more years for the clinical trial to be fully replicated. At the end of 2015, the RADIANT-4 was published. A positive RCT that led to the approval of everolimus five years later (they applied a stratified method this time).

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)00817-X/fulltext

Therefore we are not talking about abstract mathematical issues, but a real problem that impacts on drug therapy development, and affected thousands of patients who could not benefit .

7 Likes

I can see that the way it was written was slightly unclear, but to clarify the main end point in Bilcap was an adjusted hazard ratio, the covariates being the three minimisation variables resection status, ECOG performance status and site of disease (the fourth minimisation variable, the surgical centre, was not included because there were a lot of them…). The extended model that included prognostic factors sex, grade and nodal status was prespecified in the protocol. I think you could argue that if those extra variables are thought to be relevant and you can collect reliable data on them, then they should be included in the adjusted Cox regression that gives you your primary outcome, but that is a decision that has to be made up front of course. Is it helpful, given that the primary outcome is the more limited model, to include the larger model in the reporting? I’d say it was, even if in hindsight you wish you’d specified it as the primary all along.

1 Like

But this was not a coprimary end point. What does it mean to pre-specify a multivariate sensitivity analysis?

I imagine the point was that the choice of covariates in the adjusted model is not obvious - you need to pick some and they become the model for your primary. The so-called sensitivity analysis is there to see if the estimate of the primary outcome is sensitive to the choice of covariates. It turns out that it looks like it is a bit. Prespecifying at least means you’re not cherry picking the prognostic factors post-hoc to give you the lowest adjHR you can find. If I were designing that trial from scratch I would argue that the primary should probably be based on the full model because that is most likely to give you the best estimate of the treatment effect - whether that be in favour of the treatment or otherwise.

1 Like

This is what the protocol from the bilcap study says:

"The primary analysis will be comparison of overall survival across treatment groups, calculated
from the time of randomisation to the date of death (or censor date). All analysis will be on an
intention to treat basis. Comparison of survival estimates will be by log-rank analysis. […]. Cox proportional hazards models will be undertaken to determine prognostic factors and their influence on survival and provide an adjusted treatment effect by important prognostic and stratification factors". That’s all:

https://www.thelancet.com/cms/10.1016/S1470-2045(18)30915-X/attachment/72450884-700d-4196-928d-545455bcb7ce/mmc1.pdf

Whether cherry-picking has been done is unclear, as the model to be followed was not specified, and there were more covariates available in Table 1. It seems more or less reasonable but this doesn’t sound to me like a real pre-specified analysis and there is also a possible inflation of the type I error here.

Yes that is what the protocol says you are correct. The covariates for the Cox model are specified in the statistical analysis plan, and include the minimisation variables in one set, and the other prognostic variables in addition. The SAP states that the primary outcome measure will be summarised with a hazard ratio adjusted by the minimisation factors. (I should say I have some involvement with the Bilcap trial but only recently, and nothing to do with it’s set up or design. )

Moving away from the specific example of Bilcap though and on the more general point, it is well established statistically that if you use minimisation in the randomisation, then it is is good practice to adjust for those factors in the analysis. In my opinion (I don’t think I’m alone among statisticians) log rank is a fairly crude measure of survival and something like a cox model with suitable covariates is the much better way of analysing these outcomes. And as Frank said above it is preferable to include relevant prognostic factors if you have them.

A minor point, which @Stephen would no doubt know more about than me, is that minimization may require a slight change to the method of analysis. I remember seeing a paper years ago stating that minimization induces some sort of correlation among observations. I hope someone can add to that.

1 Like

Great…Can you explain then how to interpret the study from the frequentist point of view? It is not clear for me if it is a positive or negative RCT as explained in the article.

Do we need “positive” and “negative” in our clinical trial vocabulary? I think not. But this topic should be discussed elsewhere on datamethods. With Bayes I’d say something like “treatment B probably (0.91) improves outcomes over treatment A”.

2 Likes

Yes, I agree. Sorry !

Well it didn’t meet the end point of a statistically significant treatment effect based on the hazard ratio adjusted for minimisation variables, so we might say that while the patients in the trial on average did better on treatment than on observation, there is some doubt as to whether this was anything other than a chance result. However I think if you look at the per protocol population, then the result was significant - but of course there are perfectly valid reasons why this was not the primary outcome so while it might be of some clinical interest to note this we should be cautious about it as it is more subject to bias. Then there is the analysis that includes a larger set of prognostic variables and this - I think it’s in the supplementary material - also shows a treatment effect and the upper bound of the 95% C is below 1, and so “statistically significant” in this context. So again this might be of some clinical interest, and also from a statistical point of view I would argue that modelling that includes as many relevant clinical factors as possible (subject to constraints mentioned elsewhere in this discussion) is likely to be the best estimate of the treatment effect, whatever that may be. But you are right to imply that there’s a reason why we need to state the primary outcome at the outset, and other analysis can be treated with some caution, even if we assume (as in this case) that everything was conducted in good faith.

My (non-frequentist) interpretation is similar to that expressed above that given the evidence we have, it is probable that the experimental treatment leads to improved outcomes for this patient population, when compared to the control. I personally favour analysis that runs along the lines of “what is the probability that this is true and what is the uncertainty…” rather than a binary win-lose exercise.

1 Like

Thank you, from a Bayesian point of view I would personally place a slightly sceptical prior, a HR of 0.71 for example, seems to me exaggerated, I would rather aim for an absolute OS benefit of 5% at 5 years at the most, and a HR around 0.85, as my previous Bayesian bet. In fact, I have colleagues who, as clinicians, have the hunch, the intuitive feeling after treating many patients in this situation, that if this study were repeated, the favourable results would probably not be replicated. This creates problems for them in making decisions, because the drug is at best modestly effective, but it does have some serious toxicity.

That all makes sense but I’m not seeing the logic of bringing replication into this particular discussion.

new draft guidance from fda re Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products

2 Likes

I am glad to see the new draft guidance. The best thing about it to me is that it opens the door to all kinds of modern robust modeling techniques while being careful to not allow sponsors to “play with” models. Mentioning the use of Y-transformation-invariant semiparametric models for continuous and ordinal Y is a nice step in that direction.

I have two concerns with the draft:

  • The text appropriately mentions that analysis of Y can replace analysis of change from baseline for Y but should have gone farther to state that change from baseline is highly discouraged (only works for linear models; assumes linearity; assumes slope is 1.0).
  • The draft devotes too much emphasis to alternate causal estimands. Statistical estimands that respect intention-to-treat already have a causal interpretation (e.g., “the reduction in blood pressure that was caused by treatment B over treatment A”) and it is not necessary to restrict causation to just operate on a linear scale. RCTs are wonderful in not requiring a random sample of subjects, since RCTs are there to estimate differences and not absolutes. RCTs do not require representative patients; they require representative treatment effects. When using a causal estimand that does averaging over all trial patients, the resulting estimate applies only to the covariate distributions that were actually realized by the trial enrollment, volunteerism of subjects, and the inclusion-exclusion criteria. The estimate does not apply to the clinical population at large unless weighted averaging is used and the sampling weights are available (they never are), and doesn’t apply to any one type of patient in the trial itself. @stephen and I are writing a blog article that goes into more detail. :new: To have an average causal treatment effect that is meaningful, i.e., that applies outside of the sample of patients in the trial, one must have sampling weights available, e.g., to adjust the sample age distribution to a population age distribution. Sampling weights provide the relative representation of people in the sample to people in the population. I’ve never seen an RCT that has access to sampling weights.

Also see Jonathan Barlette’s comments.

3 Likes

yes, i noticed reference to that stats in med paper. I very much like that paper and another coauthored with Hothorn who seems active in transformation models (he has a good general paper in scandi journal stats). Im seeing change from baseline analysed for the WHO covid scale unfortunately, usually supplementary (when the statistician lets go of control)

look forward to that

Our group was lucky to collaborate with Hothorn to compare our methods here.

2 Likes

Thd FDA draft guidance rightfully states that post time zero measurements should not appear as covariates. This does not preclude the outcome variable and other outcome variables from being used in multiple imputation (this is not an option for MI; it is a necessity to avoid bias caused by under-imputation). This needs to be clarified in the guidance.

3 Likes

i guess this is the bit you refer to: “The ICH E9 guidance also cautions against adjusting for “covariates measured after randomization because they could be affected by the treatments.””
I had an impossible time persuading a large pharma not to adjust for post-baseline covariates, the clinical folk wanted it, and the best statisticians placated them, the habit became ossified
edit: re imputation, I guess it’s rare in industry trials

2 Likes