Repeated exposure measures, single (cross-sectional) outcome measure

First of all, a big thank you to @f2harrell and @Drew_Levy for the wonderful 4-days RMS course! I have learned so much and am now even more determined to do my part in lessening the replication crisis in my field of research.

I would love recommendations on the modeling strategies. My data has:

  • A continuous outcome Y, a blood biomarker, measured once at age 25.
  • The exposures are repeated measures of a Likert scale that measure stress, measured annually from age 9 to age 19. I’m starting with modeling this as a continuous variable because of the choice of the models that I listed below.
  • N is roughly 1000. Missing data pattern is unusually good for a longitudinal dataset with very small % of attrition.

I have two main research questions:

  1. Is the exposure across time (cumulative life-time effect) associated with the outcome measured once at age 25?
  2. Can we include an interaction term in the model to account for effect modifiers?

And maybe a bonus question:
3. Can I model my repeated exposures as ordered cumulative predictors as described by Bürkner, P. C. and Charpentier, E. (2018). or not? 10.31234/

To study these research questions, my initial literature reviews found the following strategies:

  1. Unsupervised data reduction with PCA/clustering and use of the PCs/cluster assignments in an OLS linear regression. I think that this method makes sense and is simple. The downside is the usual choice of how many PCs, or the number of clusters to choose from and how well would this work for repeated exposure over time. I can test this in a simulation.
  2. The latency models for protracted exposures by David B. Richardson at 10.1097/EDE.0b013e318194646d. This models the exposure over time as an integral of the time-weighted exposures as a term in a linear regression model.
    Screenshot 2022-05-20 092224
    The pdf of a log-normal distribution or splines can be used to estimate the weights (w) parameters if certain assumptions about the form of the exposure overtime are made (strong intial influence then dropping off overtime).
  3. The Bayesian relevant life course model by Sreenath Madathil 10.1093/ije/dyy107. This model is conceptually similar to the latency model.
    Screenshot 2022-05-20 093157.

I’m thinking of using the latency model for my data. But I was wondering if anyone is experienced with this kind of analysis and has any suggestions and recommendations?

1 Like

Hung, Thank you for your kind words; for your sense of mission to do informative and reproducible research; and moreover for your active participation at RMS 2022.

@f2harrell would have a more authoritative answer than I. But I will give you my impression and inclination.

For a continuous Y, you would have learned from @f2harrell in RMS 2022 about the advantages of the cumulative probability models (CPM) family of models, freeing you from most of the assumptions required for normality and linearity in the conventional linear model (OLS). I would consider modeling the response with a CPM.

For ten repeated measures of stress, I imagine that a time effect is an important factor and that not all of the 10 stress measures will have the same effect on Y. Presumably you have a biological, physiological, or scientific model (a structural causal model: “SCM”) for how stress leads to the biomarker measure outcome; and ideally this should be reflected in your analysis design and statistical model. This scientific model --which can be expressed as a SCM (perhaps a DAG; perhaps a simultaneous equations model (SEM; ) —should be used to strengthen your analysis and inference.

This SCM would–in theory–indicate what kind of induction period or attenuation of the exposure is expected based on subject matter understanding. An expression of the exposure should describe in an explicit and intuitive way how the exposure is related to the response based on your understanding of the scientific processes involved. The repeated measures of exposure might be integrated in a convolution (Convolution - Wikipedia) in which the exposure window, the lag structure, and weighting of each the individual repeated measures as a function of time or lag can be specified. This is a function describing how you think the exposure is operating on the response. And this should be a clear declaration of your scientific model to be evaluated by the data.

I have seen @davidcnorrismd program a convolution function for integrating repeated exposure measures and he might provide guidance.

The question #2–‘can we include and interaction term in the model to account for effect modifiers’–is not really a research question; but is a methodologic and modeling question. Frank has discussed in RMS how testing and fishing for interactions is unwise. If you believe, based on subject matter understanding that the effect of exposure is modified or conditioned in some way by another variable then that should be incorporated in the the model specification and in your analysis. The SCM might incorporate that expectation. There is a literature on how this might be done ( Weinberg CR. Can DAGs clarify effect modification?. Epidemiology . 2007;18(5):569-572. doi:10.1097/EDE.0b013e318126c11d; and Anton Nilsson, Carl Bonander, Ulf Strömberg, Jonas Björk, A directed acyclic graph for interactions, International Journal of Epidemiology, Volume 50, Issue 2, April 2021, Pages 613–619,

You will want to use the SCM to rule out ancillary variables as colliders and and make sure no back door paths are introduced in the conditioning you are entertaining.

I hope this provides some useful perspective on options for your approach to your research.