When do we adjust for baseline characteristics?

Hi all,

If we are doing a regression model, descriptive type; trying to assess multiple factors of interest with the outcome, do we need to adjust for baseline characteristics (regardless of significance)?

I understand that we should adjust if there is a potential confounder. But if we are looking at a regression model with multiple covariates, then must the potential confounder we’ve identified confound the relationship between every covariate and the outcome? Or is there no need to adjust for confounders etc. when it comes to descriptive models?

On the other hand, if we want to assess the causal relationship between a covariate (treatment vs control arms) and an outcome, then I would think that we should adjust for all potential confounders, e.g. more older patients in one group and if older patients have a worse outcome.


1 Like

Great question. You highlight two important things to consider before running an analysis.

The first consideration is whether you want to estimate descriptive differences or causal effects. Say you are tasked with allocating drugs to treat a disease between England and Scotland. You decide to compare prevalence of the disease in England versus Scotland using a regression model. Adjustment for confounding variables would generally be a mistake (unless you later post-stratify) as you are interested in descriptive differences between nations not causal effects of living in one nation versus the other for any given individual. For example, imagine you found that the population in Scotland is older than England, and that this difference in age fully explains why disease prevalence is higher there. Adjustment for age would lead to a prevalence ratio of 1, implying no differentiation. This might mistakenly lead one to allocate the same amount of drugs per person to each nation, when Scotland actually need more. Thus, whether or not to adjust for confounding variables depends on whether you have descriptive or causal goals.

The second consideration is that, even when you are interested in causation, the set of variables you adjust for will depend on the specific exposure and outcome pair (i.e., the causal relationship of interest). The common practice of chucking all covariates into one big multivariable model (“causal salad”) is problematic because variables may act as confounders for some exposure and outcome pairs, but colliders or mediators for others. This issue is wonderfully explained in Westreich and Greenland’s paper on “The Table 2 Fallacy”.


Is the question in the context of observational studies or RCTs?

For RCTs, there is no confounding by design. Including covariates in the model that are known to be strong predictors of the outcome will improve the precision of the estimated treatment effect.

Hi feizuo, this is in the context of observational studies.