Hello everyone,
I am currently analyzing a dataset where measurements of the same entity (coronary blood flow) are taken in 2 states (with negligible temporal separation between the 2 measurements). The first is the “resting” state and the second is the “stress” state (“stress” here refers to a state of augmented coronary blood flow induced by the injection of a pharmaceutical substance). Typically, what is often done is that these 2 measures are “compressed” by taking their ratio (coronary flow ratio; CFR).
The primary rationale justifying this is that the ratio roughly represents the “magnitude of coronary blood flow augmentation”, and that it is this latter conceptual entity that “really” matters when it comes to prognosis. This implies that the underlying rest/stress values of coronary flow don’t matter much once their ratio is known (which is the subject of some controversy).
After a bit of background reading (specifically: https://www.fharrell.com/post/errmed/, https://hbiostat.org/bbr/change, https://www.thespinejournalonline.com/article/S1529-9430(17)30233-4/abstract, https://stats.stackexchange.com/questions/51564/is-it-valid-to-use-a-difference-score-as-an-independent-variable-in-a-regression), the takeaway seems to be that the best analytic approach in such scenarios is to model the outcome as a smooth function of both covariates (e.g., using tensor splines or restricted cubic splines for main effects plus restricted smooth interaction term).
My question is:
Prof. Harrell states here (https://www.fharrell.com/post/errmed/) that, if both covariates are shown to be important, an additional goal would be to show how best to combine them.
A. I’m not sure how I can identify the best way of combining the two measures (demonstrating whether, for example, the ratio or difference is important). Would one simply compare something like:
rcs(first_measurement, 5) + rcs(difference, 5)
versus
rcs(first_measurement, 5) + rcs(ratio, 5)
in terms of their fit (e.g., using AIC)?
B. Conversely, I’m not sure how to tell that that a simplifying construct (ratio/difference) is insufficient to capture the relationship. Would one simply compare:
rcs(first_measurement, 5) + rcs(second_measurement, 5) + rcs(first_measurement, 5) %ia% rcs(second_measurement, 5)
versus
rcs(first_measurement, 5) + rcs(ratio, 5)
(or using rcs(difference, 5)
instead of ratio)
and see if the latter results in an unacceptable loss of model fit?
C. Is there a meaningful difference between using the second measurement or the ratio/difference if smooth terms with interaction are used? In other words, is there a meaningful difference between
rcs(first_measurement, 5) + rcs(second_measurement, 5) + rcs(first_measurement, 5) %ia% rcs(second_measurement, 5)
versus
rcs(first_measurement, 5) + rcs(ratio, 5) + rcs(first_measurement, 5) %ia% rcs(ratio, 5)
(or using rcs(difference, 5)
instead of ratio)