Hello everyone,

I am currently analyzing a dataset where measurements of the same entity (coronary blood flow) are taken in 2 states (with negligible temporal separation between the 2 measurements). The first is the “*resting*” state and the second is the “*stress*” state (“*stress*” here refers to a state of augmented coronary blood flow induced by the injection of a pharmaceutical substance). Typically, what is often done is that these 2 measures are “compressed” by taking their ratio (coronary flow ratio; CFR).

The primary rationale justifying this is that the ratio roughly represents the “magnitude of coronary blood flow augmentation”, and that it is this latter conceptual entity that “*really*” matters when it comes to prognosis. This implies that the underlying rest/stress values of coronary flow don’t matter much once their ratio is known (which is the subject of some controversy).

After a bit of background reading (specifically: https://www.fharrell.com/post/errmed/, https://hbiostat.org/bbr/change, https://www.thespinejournalonline.com/article/S1529-9430(17)30233-4/abstract, https://stats.stackexchange.com/questions/51564/is-it-valid-to-use-a-difference-score-as-an-independent-variable-in-a-regression), the takeaway seems to be that the best analytic approach in such scenarios is to model the outcome as a smooth function of both covariates (e.g., using tensor splines or restricted cubic splines for main effects plus restricted smooth interaction term).

**My question is:**

Prof. Harrell states here (https://www.fharrell.com/post/errmed/) that, if both covariates are shown to be important, an additional goal would be to show how best to combine them.

**A.** I’m not sure how I can identify the best way of combining the two measures (demonstrating whether, for example, the ratio or difference is important). Would one simply compare something like:

`rcs(first_measurement, 5) + rcs(difference, 5)`

versus

`rcs(first_measurement, 5) + rcs(ratio, 5)`

in terms of their fit (e.g., using AIC)?

**B.** Conversely, I’m not sure how to tell that that a simplifying construct (ratio/difference) is insufficient to capture the relationship. Would one simply compare:

`rcs(first_measurement, 5) + rcs(second_measurement, 5) + rcs(first_measurement, 5) %ia% rcs(second_measurement, 5)`

versus

`rcs(first_measurement, 5) + rcs(ratio, 5)`

(or using `rcs(difference, 5)`

instead of ratio)

and see if the latter results in an unacceptable loss of model fit?

**C.** Is there a meaningful difference between using the second measurement or the ratio/difference if smooth terms with interaction are used? In other words, is there a meaningful difference between

`rcs(first_measurement, 5) + rcs(second_measurement, 5) + rcs(first_measurement, 5) %ia% rcs(second_measurement, 5)`

versus

`rcs(first_measurement, 5) + rcs(ratio, 5) + rcs(first_measurement, 5) %ia% rcs(ratio, 5)`

(or using `rcs(difference, 5)`

instead of ratio)