Use of adjusted response variables vs. adjusting them for covariates

I’ve been told that it is unwise to build a model in which the outcome variable is adjusted (e.g. for age) while the independent variable is not. I imagine this would be even more of a problem when the model itself adjusts for a covariate upon which the outcome measure is already scaled e.g. if I am looking for the effects of treatment X on global cognitive function, adjusting for age and education, I suspect that I should measure global cognitive function by a metric that is NOT age-adjusted and NOT education-adjusted. Yet some outcome measures may only be available as scaled or age-adjusted scores. Can anyone guide me on when it might be ok to break the rule, and what caveats to keep in mind when interpreting my data if this rule is broken?


Welcome to datamethods Shawniqua!

I can think of three settings for this problem:

  1. A new measure is being developed and one knows that other variables should be accounted for
  2. An existing unadjusted measure is to be used and one must account for another variable in its interpretation
  3. An outcome variable you want to use is already adjusted for another variable and you don’t have access to the individual subject-level data to unadjust it. Example: you are given BMI, realize that weight should have been the target variable, and height is not in the dataset.

Setting 3 is not fixable and is similar to your problem, and you are stuck with using the adjusted variable.

When the pertinent data are available, direct, flexible adjustment through covariate adjustment is usually the say to go. That’s because there are many issues surrounding adjustment, including:

  • The adjustment may create a bias such as the way race and sex enter the eGFR equation
  • Most adjustments are not validated. For example if one does a linear age adjustment and the effect of age on the raw response variable is not linear, the adjustment will be misleading.
  • Adjustments can take away the meaning of the variable. For example, the CDC invites researchers to consider quantiles of BMI and not raw BMI. Moving to a relative (to other subjects) measure from an absolute measure violates how physics and pathophysiology works.
  • Adjustments can nullify real age-related effects. Similar to your cognitive function example, there are age-specific thresholds for deciding whether to worry about a man’s PSA level in periodic prostate cancer screening. The dominant predictor of prostate cancer risk is age. By requiring a higher level of PSA before considering a man to be at higher risk as gets older has the effect of nullifying the effect of age and pretending that risk is lower. Putting age-related thresholds on PSA is completely inconsistent with decision making.

Instead of assuming that height and weight are to be combined in the way that BMI combines them, a better analysis for obesity research is to have absolute weight as the dependent variable and baseline weight as an adjustment covariate. If one thinks that the change in weight from baseline is height-related, a better analysis would be to add height as another adjustment variable. To account for the possibility that proportionate changes in weight may be more likely than incremental changes, height and both weights could be logged.

An example where adjustment is not linear so that simple adjustments (including ANCOVA adjusting linearly for baseline) are not appropriate is in depression studies where the Hamilton-D depression scale is the outcome. Baseline Ham-D is nonlinearly related to follow-up Ham-D due to the fact that there are patients with severe depression who can achieve a much greater drop in Ham-D than patients with mild-moderate baseline Ham-D. This leads to the idea of a more general analysis that is powerful while assuming less: Use a semiparametric ordinal response model for follow-up Ham-D and adjust for baseline Ham-D using a flexible nonlinear function such as a restricted cubic spline function.

I hope that others will add to this important topic.


The question truly does open up a can of worms! I rather suspect that in the quantitative social sciences (incl. medicine) we sometimes imagine that ‘adjustment’ has a status like that of certain transformations in the hard sciences. In fluid mechanics, we have the kinematic viscosity which corrects for fluid density. Likewise in pharmacology, we might divide absolute doses by weight to obtain a biochemically more relevant quantity, concentration. No doubt many of our data transformations in medicine (especially in areas like ICU, where chemistry is ever-present) have similar aims and justifications. (@Drew_Levy recently brought to my attention a new book by @Andrew_Gelman, Hill & @avehtari, with a chapter titled “Only fools work on the raw scale.” I suppose they must have some thoughts on this also.)

Unless you’re appealing to ideas from the hard sciences, however, these sorts of transformations probably have at best the status of seasonal adjustment in economics. Thinking about when and why seasonal adjustment may help/hurt an econ analysis may temper our enthusiasm for adjustment of cognitive outcomes.

Frank, since you brought up Ham-D, I would like to suggest that the summing-up of separate items in this instrument itself amounts to an information-losing projection of a higher-dimensional factor space into 1 dimension. One reason I am a methodologic Bayesian is that Bayesian methods free us from the frequentists’ ‘need’ to do such things.

1 Like

David that’s a good goal. But additive summary scores are often a needed form of data/dimensionality reduction and can increase power by not diffusing a treatment effect over multiple parameters. Summary scores also make interaction (with treatment) assessment possible. Some sort of parsimony is needed to be able to do that. I guess an alternative is a prior that heavily shrinks towards the depression items importances being close to what the summary score thinks they should be, but allowing some departures as n gets large.

1 Like