We know that with respect to a continuous outcome variable Y, the variation over patients that one seeks in, say, a two-treatment parallel group randomized trial is a minimum. That way the standard deviation of Y is smaller, and from a purely statistical standpoint one gains power from smaller SDs. For binary Y, one wants to maximize the variance because this is achieved at an incidence of 0.5 for the event, and we want lots of events in studies. What about baseline covariates X? We know that to estimate slopes, power and precision are optimized when X has the largest possible dispersion. When X is an adjustment variable and not the main variable (treatment), things are not as clear.
FDA has a guidance on enrichment strategies here. It has the following statements:
Sponsors of investigational drug products use a variety of strategies to enrich the study population by selecting a subset of patients in which the potential effect of a drug can more readily be demonstrated. Three broad categories of enrichment strategies as listed below are addressed in this guidance: (1) Strategies to decrease variability — These include choosing patients with baseline measurements of a disease or a biomarker characterizing the disease in a narrow range (decreased interpatient variability) and excluding patients whose disease or symptoms improve spontaneously or whose measurements are highly variable (decreased intrapatient variability). The decreased variability provided by these strategies would increase study power (see section III., Decreasing Variability). …
Is this guidance accurate in general? The only way that it makes sense to me in the no-interaction case is for the strange situation in which you have strong predictors X with high variance, but you do not want to adjust for them in ANCOVA. If X has large dispersion and you adjust for X as a covariate, the residual variance in Y, which is the variance that is important, will be as small as the variance of Y from a “pure” sample with all X values equal to some constant.
The guidance seems to pertain only to the case where there is an X \times treatment interaction that is known to exist in advance (these are fairly rare in practice). In that case you might enrich trial enrollment to an interval of X where the treatment effect is larger. (Even in this case one might learn more from doing probability sampling where patients in the X space where treatment effect is strongly suspected to be larger are enrolled with probability 1, and the probability of selection decreases as X moves farther from that region.)