Should we ignore covariate imbalance and stop presenting a stratified 'table one' for randomized trials?

To me the goal of a parallel-group randomized clinical trial is to answer this question: do two patients starting out at the same point (same age, severity of disease, etc.), one on treatment A and one on treatment B, end up with the same expected outcomes? This is fundamentally a completely conditional model.


Then you are interested E(Y | T=1, COV=covariates) - E(Y | T=0, COV=covariates), or maybe for categorical Y, E(Y | T=1, COV=covariates)/E(Y | T=0, COV=covariates)? It seems like you’ll eventually want to take the expectation over the covariates, because you will be applying the treatment in a population whose covariates you cannot control and the outcome may not be independent of covariates.

1 Like

Here are my thoughts on the marginal vs. conditional issue :slight_smile:. These are two reasonable examples of how I think the goal of a randomized trial could be articulated:

  • To evaluate the extent to which treatment A, when given to the population, tends to result in more favorable outcomes on average as compared to treatment B.
  • To evaluate the extent to which a patient subgroup of covariate level(s) X = x would tend to have a more favorable outcome on average when given treatment A, as compared to a patient subgroup of the same covariate level(s), but given treatment B.

If one believes the first to be the goal, she would use a marginal model. If another believes the second to be the goal, he would be a conditional model. If one did not think of the question in advance and/or looked at a Table 1 to find imbalances that are necessarily purely random, they may be inclined to try both–and I think this really highlights my fundamental concern: lack of pre-specification may lead a researcher to make different modeling decisions at each hypothetical study replicate, thereby rendering associated standard errors invalid due to an ill-defined sampling distribution.

But back to the specific issue raised here: In the case that a linear model is used to analyze the trial, the advantage of a conditional model is the increased power due to improved precision. It turns out, as you have noted, that the value of the target parameter doesn’t change even if you condition on X. In the case of non-collapsible link functions, the marginal and conditional parameter are not equal (they usually don’t drastically differ in my experience). However, the power of the conditional model is typically higher when adjusting for prognostic variables (because of increased precision, and because the value of the conditional parameter often, but not always, tends to be further from the null).

To me, both the marginal and conditional parameters are defensible targets, and my reason for preferring a conditional model is usually more pragmatic than philosophical (increased power, better use of resources).

You raised an important issue, though: that you may be applying the treatment to a population whose covariates you cannot control. I think this is an argument in favor of having a representative sample, which necessarily means thinking very hard about having inclusion/exclusion criteria that are too restrictive. It also means, in may cases, post-marketing surveillance and Phase IV trials.


Nicely put Andrew. For reasons given by Mitch Gail in a quote I put in the ANCOVA chapter in BBR I think the conditional estimate is what is always needed:


The premise of this argument is very well taken (i.e., that there is only one patient in the room and, at his or her time of treatment, he or she is the only one of interest to the clinician). From this premise (and ignoring other potential challenges for the time being), I derive a slightly different conclusion, though, than Gail. Specifically, this premise strikes me as more of an argument in favor of allowing for effect modification between covariates and treatment than simply an argument for a model conditional on covariates.

Suppose we have evidence that a treatment is effective. Were I asked to explain my decision to administer this treatment to an individual patient from the population, I would use exactly the same response irrespective of whether the estimate of the treatment effect were derived from a marginal or conditional model. I may say, for instance: “I’m deciding to administer this drug or therapy to you because I have sufficient evidence that doing so will, on average, tend to result in more favorable outcomes for this overall patient population.” That is to say that, even with the model conditional on covariates, you’re still stuck with a single estimate of the treatment’s benefit, regardless of what you know about the patient in the room.

That there is one patient at a time on whom to decide, to me, suggests that a model allowing for effect modification gets further to the heart of the matter, such that I can update my justification for treating (or not treating) depending upon the covariate profile of the patient in the room, and truly make patient-specific decisions. With a model that allows for effect modification, a clinician can then modify the above statement as: “I’m deciding to administer this drug or therapy to you because I have sufficient evidence that doing so will, on average, tend to result in more favorable outcomes for the patient subpopulation having the same set of covariates as you.”

Naturally, deciding on potential effect modifiers and weighing evidence in favor of or against effect modification is another challenge (perhaps for another post!). :slight_smile:

1 Like

It seems that the result of a broad-inclusion clinical trial analyzed in a conditional manner could be reproduced (perhaps less efficiently, but with fewer modeling assumptions) by a series of narrow-inclusion clinical trials, each analyzed in a marginal fashion. In fact, as the last sentence of the quote @f2harrell shared suggests, both marginal and conditional analyses are almost always conditional and marginal, with respect to at least some covariates. Isn’t it a spectrum? At the marginal end, arent there applications, e.g., in public health policy?


Nice points. I do interpret Mitch Gail’s point as applying to main effects, without the need to appeal to interactions. But if interactions exist, covariate adjustment is not just an option, it is mandatory IMHO.

I keep going back to the basic question: How would this patient fare were she put on treatment B instead of treatment A?

1 Like

“I do interpret Mitch Gail’s point as applying to main effects, without the need to appeal to interactions”

Frank, it reads like you’re saying that estimating an average treatment effect adjusting for covariates allows you to make more personalized treatment decisions than an unadjusted average treatment effect estimate. (I suspect you didn’t mean to say this, but it’s the only way I can interpret what you said.) Whether or not you adjust for covariates, if you’re just estimating an average effect you’ll make the same decision for every patient. I second SpiekerStats’ point that to make “personalized” treatment decisions you need to estimate conditional effects, which require interaction terms.

On a side note, I dislike the term “personalized treatment decisions” when really the decisions are just based on some vector of covariates x, not the full person. The decision for a given person could be different depending on what covariates were measured in the trial. “Covariatized treatment decisions” might be a better term even though it includes a made up word.

I like covariatized–I tend to use the phrase “subgroup-specific” myself :slight_smile:.

The presence of important interaction terms makes the case more compelling, but what I described also works when there are no interactions. That’s because of risk magnification. High risk patients get more absolute benefit, and “higher risk” is estimated using the covariates.

Personalization is relative, so I take ‘personalized treatment decisions’ to mean decisons that are competently informed by the patient at hand’s individual characteristics. By contrast, depersonalized medicine would be using an overall average effect or using group estimates (age > 65) instead of this patient’s age of 66. To your point though, covariate-specific estimates of covariate-specific decisions would be a good way to describe all this.

I would avoid ‘subgroup’ because that means stratification, and you can’t meaningfully stratify on continuous variables.

1 Like

I suggest further comments on conditional vs. average treatment effects be moved to a new topic.

The non-callapsability of non-linear model parameters does not apply to predictions. SeeLee and Nelder (2004) and discussion of that paper.
As regards Table 1, I like the idea of replacing it by some helpful graphics illustrating the covariate distribution of the trial as a whole. Here is my take on baseline balance

  1. Valid inference is a matter of valid conditioning. If you ignore something that is prognostic, whether or not it is imbalanced, and bet using the marginal probabilities rather than conditional one, you choose a losing strategy against a more experienced gambler. This should be obvious to any Bayesian and is the idea behind my two dice game to illustrate randomisation

  2. Baseline balance can be validly examined as a strategy of auditing a trial. In that case, it seems pretty obvious that we would want smaller P-values to call foul than currently. This is perhaps one occasion where Bayesians would be justified in putting a lump of probability on the null hypothesis being literally true.It also seems obvious that the analysis should condition on exactly the randomisation scheme the protocol claims was used. See Fishing for Fakes with Fisher

  3. Whether a measured covariate should be in the model or not depends on whether it is believed to be predictive. Failure to understand this has elsewhere led to the inappropriate espousal of the propensity score.

  4. Balance is valuable as a contribution to efficiency. It has nothing to do with validity. So many amateur commentators on the RCT get this spectacularly wrong.

  5. Randomisation guarentees marginal probability statements that are valid. This is not an excuse for not conditioning. However, marginal probabilities are calibrating, and that is useful, and also randomisation entitles us to ignore covariates we have not measured.

  6. The whole point about statistical probability statements is just that: probability statements. Thus a point estimate as a certain statement of the true value is false but as an acknowledged potentially fallible statement of a possible truth if assigned a correct probability it is true. A gambler who said that a fair die will have a 1/6 chance of showing a 6 if rolled was telling the truth even if the die is rolled and showed a 5. Again this is a blindingly obvious fact that seems to get forgotten by all the amateurs. We statisticians need to rub their noses in it.


Sorry. Seemed to have linked to the wrong piece in Applied Clinical Trials. The correct link is Baseline Balance and Valid Statistical Inference (I hope!)


There is nothing that prevents one to

  1. Analyze the data conditionally (because in many cases this is the right thing to do to avoid “attenuation bias”

  2. Once the conditional estimates have been derived, marginalize over the distribution of covariates e.g. by simulation to get a population effect

The former approach would allow one to better use scarce resources i.e. the data from the clinical trial. Conditional inference can support individual decision making e.g. when there is a significant treatment by covariate interaction or when an absolute risk estimate is required. The second approach could support policy making and does not have to be done in the context of the original publication.
In fact, one way to drastically change trial reporting so that one can have their cake and eat it too, is to provide the entire covariance matrix of the model that was fit and allow people to have Monte Carlo fun with it

I like model-based approaches because they are explicit and statistically optimal. But the language that is commonly used for what you recommend is subverting what is going on. The result of your procedure is not at all a population effect. It is a sample effect . To get population estimates you’d have to plug in the parameter estimates to the covariate distribution in the population.

Then this begs the question of why we would seek population averages in the first place. We’re supposed to be interested in precision medicine nowadays. Being precise here means plugging in the parameters to get the estimate of the absolute risk improvement for an individual patient. This strategy is detailed here and here.

Populations are always finite in the real world. If I am running the drug program for an insurance formulary with 100k clients, I’d want to know what will happen if I accept drug X v.s. Y for my formulary.

Since “finite population” sims are critically important to decision makers, it makes sense to get individual predictions (from an adjusted model) and average out using the characteristics of the cohort one is interested at.

Yes I think it’s good to distinguish the uses to which estimates will be put, i.e., to be specific to the decision that is required to be made. A decision to put a drug on a formulary seems to call for a group-level decision, but that assumes the formulary has no clauses for how prescriptions are to be made (do they ever?).

But can’t group-level average absolute treatment benefits still be problematic when a Simpson’s rule-like effect exists? Can’t group averages lull decision makes to favor a drug when it is not very good for males nor for females but on the average is OK?

1 Like

The Simpson rule is applicable in the formulary example only if one does a poor job modeling the conditional effects. This is how it should play out if one wanted to guard against the Simpson paradox:

  1. Get a dataset eg from a trial
  2. Fit a covariate adjusted model to get conditional effects . This model should be as good as possible ie include all potentially important covariates. One should better fit this under a Bayesian paradigm to “shrink away” spurious covariate by treatment interactions.
  3. Get a new dataset ( e.g. the “insurance plan”) and derive individual counterfactual predictions (what if scenarios for each individual). To account for inferential uncertainty one typically would have to sample (from the posterior distribution of the analysis in step 2)
  4. In economic evaluations the marginal (over all individuals present in the dataset of step 3) is typically of interest and the calculations although numerically intense are straightforward

One can supply the original dataset of step 1 in step 3 if one wanted to explore differences between marginal and conditional effects. We have used this strategy to explore certain discrepancies in marginal and conditional effects in a device trial sponsored by the NIH a few years ago. Simon Wood’s mgcv and Dimitris’ Rizopoulos jm packages provided the computational backbones for these analyses

1 Like

Thanks. Let’s close this particular discussion now since it is not related to the original question.

1 Like