Using 2 different measures of the same entity for risk prediction

Hello everyone,

I am currently analyzing a dataset where measurements of the same entity (coronary blood flow) are taken in 2 states (with negligible temporal separation between the 2 measurements). The first is the “resting” state and the second is the “stress” state (“stress” here refers to a state of augmented coronary blood flow induced by the injection of a pharmaceutical substance). Typically, what is often done is that these 2 measures are “compressed” by taking their ratio (coronary flow ratio; CFR).

The primary rationale justifying this is that the ratio roughly represents the “magnitude of coronary blood flow augmentation”, and that it is this latter conceptual entity that “really” matters when it comes to prognosis. This implies that the underlying rest/stress values of coronary flow don’t matter much once their ratio is known (which is the subject of some controversy).

After a bit of background reading (specifically: https://www.fharrell.com/post/errmed/, https://hbiostat.org/bbr/change, https://www.thespinejournalonline.com/article/S1529-9430(17)30233-4/abstract, https://stats.stackexchange.com/questions/51564/is-it-valid-to-use-a-difference-score-as-an-independent-variable-in-a-regression), the takeaway seems to be that the best analytic approach in such scenarios is to model the outcome as a smooth function of both covariates (e.g., using tensor splines or restricted cubic splines for main effects plus restricted smooth interaction term).

My question is:

Prof. Harrell states here (https://www.fharrell.com/post/errmed/) that, if both covariates are shown to be important, an additional goal would be to show how best to combine them.

A. I’m not sure how I can identify the best way of combining the two measures (demonstrating whether, for example, the ratio or difference is important). Would one simply compare something like:
rcs(first_measurement, 5) + rcs(difference, 5)
versus
rcs(first_measurement, 5) + rcs(ratio, 5)

in terms of their fit (e.g., using AIC)?

B. Conversely, I’m not sure how to tell that that a simplifying construct (ratio/difference) is insufficient to capture the relationship. Would one simply compare:

rcs(first_measurement, 5) + rcs(second_measurement, 5) + rcs(first_measurement, 5) %ia% rcs(second_measurement, 5)
versus
rcs(first_measurement, 5) + rcs(ratio, 5) (or using rcs(difference, 5) instead of ratio)

and see if the latter results in an unacceptable loss of model fit?

C. Is there a meaningful difference between using the second measurement or the ratio/difference if smooth terms with interaction are used? In other words, is there a meaningful difference between

rcs(first_measurement, 5) + rcs(second_measurement, 5) + rcs(first_measurement, 5) %ia% rcs(second_measurement, 5)
versus
rcs(first_measurement, 5) + rcs(ratio, 5) + rcs(first_measurement, 5) %ia% rcs(ratio, 5) (or using rcs(difference, 5) instead of ratio)

1 Like

This is the kind of problem that is really great to work on. Often I’ve wished that we could put a restriction on our models such that the two variable have the same shape of rcs transformation, like you can do for monotonic variables in the brms package. When there are no interactions you can do such an analysis by making the dataset twice as tall and using a cluster sandwich covariance estimator to adjust standard errors. But that’s kind of awkward. The idea is to have a common shape but with a simple magnifier (“external \beta”) for the second predictor.

The way that clinical researchers tend to analyze such data makes an initial mistake that is quite common: assuming that change is more important that stressed state. If you use resting left ventricular ejection fraction (LVEF) and LVEF under maximum exercise to jointly predict time until cardiovascular event, you find that resting LVEF is irrelevant, i.e., that change from rest to exercise is almost solely noise.

Likewise if in ICU patients you measure serum creatinine (SCr) on day 1 and on day 3, and predict survival from day 3 onwards, day 1 SCr is almost irrelevant, i.e., the change in SCr is a weak prognostic variable.

You might think of this strategy, which focuses on predicted cross-validation predictive discrimination by computing AIC on several models, letting the two measurements be denoted A and B.

  • log(B/A)
  • log(A) + log(B)
  • rcs(log(A)) + rcs(log(B))
  • rcs(log(A)) + rcs(log(B)) + rcs(log(A)) %ia% rcs(log(B))

Each rcs is mean to use 4 knots.
Model (1) vs. (2) checks the adequacy of the ratio assuming linearity in all the logs. (3) vs (2) checks linearity in the logs. (4) vs (3) checks for interaction.

I hope you’ll post what you find. My money is on A being fairly irrelevant.

3 Likes