Baseline adjustment vs Fixed Effects for Observational Longitudinal Data

So take a situation in observational data where you have time-fixed treatment T, time-fixed covariates X_i, outcome Y_it. Treatment is not randomized and directly depends on covariates, which also influence the outcome. The following is the causal graph:

Essentially the covariates and treatment affects each of the outcomes which are also influenced by previous outcome. How would you estimate the causal effect of the treatment?

I’m new to graphs and in Biostat learned things like Mixed Models, ANCOVA, etc but the newer causal inference graphical model SEM stuff seems to be suggesting these approaches are not really valid (or I haven’t made the connection from DAG to these).

I would like to find the causal effect of the treatment on the outcome at each time point. The daggity software is telling me that when looking at the effect of Treatment on Yt1 that adjusting for Baseline is adjusting for a mediator and not good for estimating the total effect of Treatment on Yt1, but that this adjustment is needed to estimate the “direct” effect.

In school, we also learned that you could just do a mixed model and treat (Yt0,Yt1,Yt2) as the outcome, do a random intercept for Subject ID and then Treatment, X as covariates. Then I found out about Fixed Effects and how it is better to include Subject ID as a fixed effect so as to account for unobserved time fixed confounders: Fixed effects model - Wikipedia

However the fixed/mixed effects approach (with all Ys as the outcome) still wouldn’t consider the possible effect of the previous outcome on the current right? That is what this paper seems to suggest with assumption (b): https://imai.fas.harvard.edu/research/files/FEmatch.pdf.

Whereas doing a model for Yt1 against Yt0, X, Treatment would? And then another model for Yt2 vs Yt1, X, Treatment? That seems to be what the graph suggests would give the direct effect of the Treatment on the outcome at each time point.

I am just really confused with these newer causal graph techniques and how they relate to what I already learned, because prior to learning about them I probably would have just thrown this sort of data into a mixed model and looked at the Treatment coefficient. But now I am realizing that may not be correct…

Attached is the causal graph. What model would be reasonable for estimating the treatment effect with this graph?

1 Like

In my experience the use of ordinary causal diagrams does not invalidate and is largely orthogonal to modeling decisions such as whether to use fixed or random slopes. However, I do agree that your scenario may be better addressed by G methods. Here is a very accessible description of these considerations that includes causal diagrams.

1 Like

In my experience the use of ordinary causal diagrams does not invalidate and is largely orthogonal to modeling decisions such as whether to use fixed or random slopes. However, I do agree that your scenario may be better addressed by G methods. Here is a very accessible description of these considerations that includes causal diagrams.

Thanks, can you expand on the causal graphs being orthogonal to the modeling decisions? In a lot of the recent biostat as well as even bayesian, ML/AI causal literature, it seems like the graphs essentially are the model in that they directly inform the structure of the model which you use. For example the probabilistic graphical models for discrete data: Causal Inference — pgmpy 0.1.15 documentation

Its interesting though, causal inference in some sense is opposite of predictive ML but it does feel at times that all these graphs make it equally if not more black box to understand how to actually estimate the effects of interest. I wonder though, if people actually trust “black box causal inference estimates” that come out of these DAGs. There are tons of packages nowadays that do this, but I am trying to figure out at least how to replicate simple common situations to build my intuition so that I can explain these methods even if I don’t fully understand the computation.

I’m thinking for the graph above, since in this particular case the Treatment is fixed but only the outcome is varying, I could use IPTW to estimate weights of Treatment at Baseline, and then just do 2 models at t=1 and t=2 where I adjust for the previous outcome in a weighted GLM and then the coefficient of the treatment in each model would be the direct effect of the treatment on the outcome at that time? Basically:

P(Tmt) = f(X_1,X_2)
Y_{t1} = b_{01} + b_{11}Y_{t0} + b_{21}Tmt , weight by 1/P(Tmt received)
Y_{t2} = b_{02} + b_{12}Y_{t1} + b_{22}Tmt , weight by 1/P(Tmt received)

Then b21 would be the direct effect of the treatment on Yt1 and b22 would be the direct effect on Yt2?
Getting the proper uncertainty estimates seems difficult though due to the first step.

1 Like

Think of it this way: the DAGs encode one aspect of the model (i.e., the non-parametric structural causal constraints) but they are less committal about other aspects. For example, let’s take the DAG X->M->Y where X is the treatment, M is the mediator, and Y is the outcome. The statistical model can be a proportional odds regression (treating Y as an ordinal outcome), a logistic regression (dichotomizing Y), a linear regression etc etc. all based on the same DAG.

Or take the practical DAG example we provide in Figure 3B here. There are multiple different ways, all consistent with the same DAG, that we could have used to model the relationship between renal cell carcinoma histology and overall survival. DAGs are useful tools but are only one part of modeling considerations.

I agree, IPTW does tend to be the most accessible of the G methods and the approach you proposed is plausible within that framework. However, looking again more carefully at your DAG, I do not really see it being incompatible with a mixed model as you originally described in your first post. G methods would be preferable if there were time-dependent exposures or confounders, which I do not see in the DAG.

1 Like

Think of it this way: the DAGs encode one aspect of the model (i.e., the non-parametric structural causal constraints) but they are less committal about other aspects. For example, let’s take the DAG X->M->Y where X is the treatment, M is the mediator, and Y is the outcome. The statistical model can be a proportional odds regression (treating Y as an ordinal outcome), a logistic regression (dichotomizing Y), a linear regression etc etc. all based on the same DAG.

Yea, though in this case you mention the models are still in the same general functional form of Y = f(X,M) whereas in this example, the mixed model would be a different functional form than having separate models at each time point adjusted for the previous.

I agree, IPTW does tend to be the most accessible of the G methods and the approach you proposed is plausible within that framework. However, looking again more carefully at your DAG, I do not really see it being incompatible with a mixed model as you originally described in your first post. G methods would be preferable if there were time-dependent exposures or confounders, which I do not see in the DAG.

In the above example I don’t but actually I also deal with time-dependent exposures in the next part because the Ys I drew above actually become Ms (mediators) which then connect to other outcomes Ys at each time point. Didn’t draw the full graph earlier since I wanted to understand a simpler case. In that case, I want the effect of each M and then I will have time-dependent exposures like below. In this case, would some joint modeling approach be needed?

1 Like

Ouch, that’s a lot. A useful philosophy when modeling typical clinical data is to keep things as simple as plausibly possible, at the cost of increasing bias in the bias vs variance tradeoff. I would still go with one full mixed model. Others may incorporate a shared frailty approach (for multiple observations coming from the same person) or correct for intra-patient correlation using the robcov function in RMS. If it fit the clinical question and context to model Mt0-2 as a dynamic treatment regime, I would then consult with my colleagues to see if we could model it using the approaches described here, here, and here. Others in the forum may have additional thoughts.

2 Likes