There has been a lot of discussion in the past few months about how covariates should be taken into account in RCTs and which statistical model should be used. Some people are advocating the use of additive linear models because they like “causal estimands” such as average treatment effects on a probability scale (when the outcome is binary). There needs to be a clear message about why and how covariate adjustment should be undertaken.

As shown in references discussed in the analysis of covariance chapter in BBR, the purpose of covariate adjustment when the outcome is categorical or represents time-to-event is to **prevent a loss of statistical power in the face of outcome heterogeneity**. If there are strong prognostic factors such as age and there was a range of ages of study subjects, there will be outcome heterogeneity due to the subjects’ age distribution. Conditioning on the ages will prevent power loss.

Risk differences and risk ratios are useful concepts for homogeneous samples, but not so much for heterogeneous ones. In the classic two-homogeneous-samples problem, the choice of treatment effect metric matters less, and covariate adjustment is not needed because in a homogeneous sample either the covariates all have a single value or the covariate effects on outcome are all zero.

On the question of model choice, the logistic model has been found on the average to provide a better fit to the patient outcomes than other models. That is primarily because it places no constraints on any of the regression coefficients. A useful metric for choosing a model is the likelihood ratio \chi^2 statistic for the combined effect of all two-way interactions involving treatment. One will typically see that this measure of outcome variability due to interactions is smaller for the no-constraint logistic model than it is for models stated in terms of risk differences or log risk ratios. I have more faith in the model needing the fewest departures from additivity (lowest amount of improvement in log-likelihood by adding all the treatment interactions).

A well-fitting model that allows for there to possibly be **no** heterogeneity of treatment effect provides the best basis for assessing whether there is such heterogeneity. It is possible for a treatment odds ratio to be constant, whereas it is not possible for an absolute risk reduction or risk ratio to be constant across a wide variety of patient types. Sicker patients have larger absolute risk reduction from effective treatments, and a risk ratio cannot exceed 2 when a patient has a baseline risk of 0.5.

For a simple example showing why conditional treatment effects are more appropriate than unconditional ones, see this. This example is one where conditional (adjusted) treatment effects are much more interpretable and relevant than the marginal treatment effect ignoring the covariate.

Some are advocating the use of population-averaged treatment effects in RCTs. Not only are such measures non-transportable as demonstrated in the simple example above, but they are not even calculable. To understand why I say this you need to understand how advocates are mislabeling what is really sample-averaged treatment effects with the term *population-averaged treatment effect*. Sample-averaged effects are dependent on the study’s inclusion criteria and seem to be used only because they are easy to compute. But to compute the desired population-averaged treatment effect when the RCT did not randomly sample patients from the population one must know the sampling probabilities in order to do the proper weighted estimation to obtain population-averaged treatment effects. For example, if patient volunteers tended to be younger and more white than the population at large we would need to know the probability of being sampled as a function of age and race. These sampling probabilities would then be used to compute population-averaged effects.

RCTs use convenience samples, which is fine because the role of RCTs is to estimate relative efficacy and safety, and these quantities can be estimated from highly non-random samples as discussed here. Advocates of average treatment effects are thus telling us that

- conditional relative efficacy measures that are patient-specific, transportable, and form the proper basis for examining heterogeneity of treatment effect while being easier to compute and interpret are not good enough
- population-averaged treatment effects should be used instead, and the advocates are not able to tell us how to compute them so they pretend that sample-averaged effects are the target of inference

Another way to understanding the choice of effect scale (absolute risk reduction - ARR; RR - risk ratio; odds ratio - OR; HR - hazard ratio, for example) and the role of covariate adjustment is to consider the statistical model that was envisioned when statistical tests and estimators were first derived for the two-sample problem. In all the statistical texts describing these statistical methods, there is an *iid* (independent and identically distributed outcomes) assumption. *Identically distributed* pertains, within a given treatment group, to each patient having the same outcome tendency, e.g. the same probability of a binary outcome. So within treatment group, patients are assumed to be homogeneous and covariates are either constant or have no effect. In either case these covariates may be ignored. The traditional estimators for the two-sample binary Y problem are differences in means and differences in proportions (ARR) or risk ratios (RR). These estimators assume homogeneity of patients within treatment group. When there is an impactful dispersion of baseline covariates, *iid* no longer holds and quantities such as ARR and RR that ignore covariate adjustment no longer apply. ORs and HRs, on the other hand, exist for multivariable heterogeneous situations, readily allowing adjustment for covariates without placing restrictions on the effects of covariates and treatment. This lack of restriction is due to the unlimited range of logistic and Cox model coefficients which also explains why such relative treatment effect models do not require interactions to be present just to keep probabilities inside [0,1].

To say this another way, differences in proportions (ARR) are nice summaries of treatment effects in the homogeneous case but when there is outcome heterogeneity they fail to recognize that higher risk patients usually get higher absolute treatment benefit. Standard covariate adjustment on the unrestricted relative treatment effect basis satisfactorily addresses all these issues, the the relative treatment effect can easily be translated into a patient-specific absolute benefit by simply subtracting two logistic models.