Mathematical coupling in epidemiology

I am trying to estimate the effect of exposure to a certain substance, measured in urine, on a certain endogenous molecule, also measured in urine. The issue with substances measured in urine is that their levels can be affected by the hydration status of the subject [1]: for this reason, we usually adjust for creatinine levels by dividing the varable by the levels of creatinine of each subject:

\text{var}_{new} = \text{var}_{old} / \text{creatinine}.

The problem now is that if both the dependent and independent variables are to be adjusted for creatinine values, we will create what is known as mathematical coupling and the effect estimates will be biased. There are more sophisticated methods to adjust for hydration status, but this is still the most common. In my case I am using the following approach:

  1. Fit a regression model of the type
\text{creatinine} \sim \text{age} + \text{sex} + ...
  1. Predict the levels of creatinine using the model above (C_{\text{pred}})
  2. Compute the ratio
C_{\text{ratio}} = \frac{\text{creatinine}}{C_\text{pred}}
  1. Divide the levels of the exposure of interest by C_{\text{ratio}}. This variable will then be used in the outcome model, while including creatinine levels as covariates.

Again, I would need to do this for both the exposure and the outcome (i.e., the endogenous molecule). I am thus afraid that the estimates I will get will be biased too. Any suggestion?

[1] Put simply, hydration status affects both the exposure and the outcome, thus it is a confounder. Previous studies have shown that simply adjusting for creatinine levels is not sufficient.


This may work but I would always first try a more direct adjustment, e.g., using a spline of creatinine. Why did previous studies find covariate adjustment not sufficient? I could see a time-circularity problem here, but that would also be present with the indirect adjustment you propose wouldn’t it?

1 Like

One paper describing this issue is this:

Although commonly employed, the standardization and covariate adjustment approaches may be problematic in many scenarios. Creatinine and serum lipid levels can be affected by individual factors, and inclusion of these factors in statistical models can induce biased associations between standardized biomarkers and health [3•]. These relationships are further complicated by the fact that urinary or serum biomarker concentrations are usually proxies for concentrations in the more relevant target tissue. Given these considerations, it becomes clear that the choice of correction method should be based on the specific causal relationships under study.

Regarding the circularity problem, I am not sure what you are referring to.

I think that normalizing in the fashion you propose may have the same problem the paper discussed. Not sure.

Perhaps I’m missing something big here… but… if you were talking about two different exposures and their relationship to some outcome, I see how the issues you are concerned about arise. Lets suppose you were simply dividing by creatinine you’d have something like this:

y \sim \frac{exp1}{creat.} + \frac{exp2}{creat.} + ...

But, in your problem you talk about an exposure and an outcome adjusted for hydration. So your situation is something more like:

\frac{y}{creat.} \sim \frac{exp1}{creat.} + ...

and since you have the same denominator on both sides (or whatever adjustment method you prefer), can’t you just ignore hydration1 ? i.e:

y \sim exp1 + ...

1… assuming that hydration itself doesn’t affect the relationship between the exposure and outcome in which case you probably already need some more sophisticated approach.