# How to assess an isolated effect or causal relationship in longitudinal data?

My question relates to causal inference in longitudinal data setting.

I have a dataset which includes certain measurements from same individuals at two different time points, say at baseline and at 12 months. My main research question is whether a change in a structural variable A causes a negative change in a physiologial outcome Y (“Does A cause Y?”). A is an ordinal variable and Y is continuous. Ordinal variable B is another structural variable and it has been previously shown that that B causes Y meaning that worsening status in B causes a negative change in Y. I have a reason to believe that A is weakly correlated with B.

If I aim to study whether A causes Y, what sort of a model I should build? Generally I should not regress a change score so I would take Y at 12 months as an outcome and Y at baseline as a covariate. Or would it be appropriate in this case to regress change in Y? And how should I deal with A? Is it appropriate to take a change value from A and add that as a covariate? Or should I have somekind kind of an interaction term? Should I include B in a similar way because it seems to be a confounder (causes both A and Y) based on prior literature?

Fortuitously, I came across this very nicely looking paper which was recently published but as far I understand it discusses only cross-sectional data: https://academic.oup.com/ije/article/51/5/1604/6294759

whether you use Y or change in Y is just a matter of interpretation. As long as you adjust for baseline Y they will produce the same estimate. I’m not sure using change in A would be relevant because ‘in the field’ it would be of no use to predict Y based on A measured at the same time Y is measured, ie adjust for baseline A. Regarding the value of A if it is correlated with B, you could adjust for A after adjusting for B and see if it’s enhancing the predicitve ability of the model. If A is easier or cheaper to measure than B then maybe you want to include just A in the model to see how it performs on its own

Thanks. But I assumed I would have much stronger basis for causal inference when I have data on possible simultaneus change on both A and Y. Both variables are measured simultaneously but on different time points so I have Y_baseline, A_baseline, Y_12months and A_12months. I understand that there is still some limitations for causal inference but would it be wise to somehow include also the change observed in A?

it doesn’t sound like a preditiction problem in this case but instead a multivariate problem, ie consider A as an associated outcome, and then some joint modelling of Y and A could estimate the strength of the correlation. There’s no sense then in which A causes Y, they simply covary?