Are hazard ratios still hazardous in hierarchical models?

Miguel Hernán admonishes against the use of hazard ratios for causal inference because selection bias is built into them:

Yet if I build a hierarchical model that allows for individual-level parameters, under what conditions would this render hazard ratios less hazardous? This seems to be what folks like Fiona Steele suggest:

1 Like

When proportional hazards holds and the study design is causal, the hazard ratio can be interpreted causally. That is because the log hazard ratio is identical to the difference in cumulative incidence after proper transformation (log-log S(t)) and cumulative incidence is causal, respecting intention to treat (ITT) because it is an unconditional estimate. When there is non-proportional hazards that is modeled through time-dependent covariates, the interpretation of time-specific hazard ratios can be non-causal (and not respecting of ITT) after the first time at which the hazard ratio is allowed to change. But if one integrates the time-varying hazard function to obtain the cumulative hazard function, and uses that to for example estimate the cumulative incidence at 5 years, these unconditional estimates can be interpreted causally. So hazard ratios are still proper building blocks towards efficient and well-fitting models that can provide causal interpretations. In my view this is more interpretable than mean restricted survival time (RMST). The key is to be aware of which estimands are unconditional and which are time-dependent, the latter involving changing risk sets that lose the ITT interpretation as described so well by Hernán.


But are there conditions when including individual-level parameters (I.e. building a multilevel model) solves the selection bias problem by accounting for unmeasured heterogeneity, thus rendering even non-proportional hazards a causal interpretation? That seems to be the motivation behind frailty models.

I think the motivation for adding subject-level random effects to the model is to account for inter-subject outcome heterogeneity that is not measured through covariates. I think the selection problem that leads to difficulty in causal interpretation is time selection and I’m not sure that non-directional random effects account for this. Some kind of serial correlation approach might possibly do it.

1 Like