Setting:
- Exposures: multiple exposures (biomarkers, all continuous) measured at two time points, once in pregnancy and once in adolescence
- Outcome: clinical outcomes measured in adolescence (concurrently with measurement of exposures in adolescence, either continuous or discrete raw counts). I am still trying to understand which one is more appropriate
- Population: a little over 1000 subjects
- Question: what is the effect of exposure to certain chemicals on neuro-development? (I know this is not a formal causal/statistical estimand, but I am more interested in the statistical framework to use right now).
My original plan was to use longitudinal modified treatment policies with TMLE and SuperLearner to assess the effect of exposure on the outcome (one exposure at a time or multiple exposures simultaneously). I kinda fell in love with the work on Targeted Learning and the causal roadmap. But I also started noticing some people criticising it (overfitting of the SuperLearner, no advantages over other methods, etc…).
After reading some posts in this forum and here, and after bumping into the work of Jennifer Hill on BART for causal inference, I decided to explore other options as well. I know have more questions than I have answers…
It is not really clear to me how to use BART for this analysis. Is it possible to fit BART considering both continuous exposures and continuous outcomes? Is it possible to fit BART using exposures information from both time points (i.e., longitudinal exposures and cross-sectional outcome)?
I found very little in the literature so far.
Based on the posts of @f2harrell, it seems that Ordinal Regression might be another option, although I am not sure it can handle repeated measures of the exposures.
Finally, another option would be to balance the covariates using BART and the WeightIt R package, and then fit a GLM with the obtained weights. In this case I might even be able to consider the longitudinal nature of the exposures. The only problem is that I would still rely on a linear model, rather than semi-parametric models.