How much patient variability is best for a clinical trial?

We know that with respect to a continuous outcome variable Y, the variation over patients that one seeks in, say, a two-treatment parallel group randomized trial is a minimum. That way the standard deviation of Y is smaller, and from a purely statistical standpoint one gains power from smaller SDs. For binary Y, one wants to maximize the variance because this is achieved at an incidence of 0.5 for the event, and we want lots of events in studies. What about baseline covariates X? We know that to estimate slopes, power and precision are optimized when X has the largest possible dispersion. When X is an adjustment variable and not the main variable (treatment), things are not as clear.

FDA has a guidance on enrichment strategies here. It has the following statements:

Sponsors of investigational drug products use a variety of strategies to enrich the study population by selecting a subset of patients in which the potential effect of a drug can more readily be demonstrated. Three broad categories of enrichment strategies as listed below are addressed in this guidance: (1) Strategies to decrease variability — These include choosing patients with baseline measurements of a disease or a biomarker characterizing the disease in a narrow range (decreased interpatient variability) and excluding patients whose disease or symptoms improve spontaneously or whose measurements are highly variable (decreased intrapatient variability). The decreased variability provided by these strategies would increase study power (see section III., Decreasing Variability). …

Is this guidance accurate in general? The only way that it makes sense to me in the no-interaction case is for the strange situation in which you have strong predictors X with high variance, but you do not want to adjust for them in ANCOVA. If X has large dispersion and you adjust for X as a covariate, the residual variance in Y, which is the variance that is important, will be as small as the variance of Y from a “pure” sample with all X values equal to some constant.

The guidance seems to pertain only to the case where there is an X \times treatment interaction that is known to exist in advance (these are fairly rare in practice). In that case you might enrich trial enrollment to an interval of X where the treatment effect is larger. (Even in this case one might learn more from doing probability sampling where patients in the X space where treatment effect is strongly suspected to be larger are enrolled with probability 1, and the probability of selection decreases as X moves farther from that region.)


Great post. Oncology is one of the fields where we truly have very strong X × treatment interactions in some cases. For example, presence of EGFR driver mutations in non-small cell lung cancer or BCR-ABL1 fusions in chronic myeloid leukemia. It is a good idea to enrich for those in trials.

Even in oncology, however, most such relationships are weaker and more complicated. Example: the expression of PD-L1 by immunohistochemistry in tumors cells in trials of immune checkpoint inhibitors. It is debatable whether and how one should enrich for this PD-L1 expression.

Question for @f2harrell: would you consider time to event outcomes as similar to continuous variables Y where smaller variations increase power and precision?


A good question. Only when the fraction of patients suffering an event is very large (say > 0.5) does a time-to-event outcome act like a continuous variable. In the usual situation with lots of censoring, it behaves more like a binary Y. There power increases with number of events, and studies are frequently enriched with high-risk patients.


If you know treatment efficacy and safety does not interact with patient characteristics, then it doesn’t make much sense to ensure participants are diverse. Problem is, you can only believe unless you really test it.

As a physician, I prefer to prescribe medication to children, pregnant women etc when the medication was actually tested in those groups. Unfortunately, we frequently don’t have such evidence, and whence we depend on extrapolation, observational data (with regard to pregnancy safety) and some luck.

i thought ethics was the argument for enrichment. otherwise the public may misconstrue the above justification as an attempt to “rig clinical trials”, especially if they have read eg marcia angell who i think spoke about run-in periods etc

Great post. Educational as always.

What if X is not perfectly measured?