RMS Modeling Longitudinal Responses

The goal is to evaluate the impact of an intervention on children’s BMI before and after the intervention. Ideally, evaluations should occur at three distinct time points (t1, t2, t3). However, the available data exhibit unbalanced repeated measure design, some children evaluated only once during t1, t2, or t3, others assessed twice during t1/t2, t2/t3, t1/t1, t2/t2, and some evaluated at all three time intervals.

Additionally, this is a multi-center study. Not all centers provide both before and after intervention data. For before intervention, I have 100 observations from children in NY, 200 observations from children in Puerto Rico, 150 observations from children in Utah. For after-intervention period, I have data from children in Michigan, Georgia, and Ohio.

Given the clustered nature of the data and the mismatch between before and after intervention data across different centers, I am uncertain about the most appropriate model to fit. Would an autocorrelated time series model or a mixed-effects model be more suitable for this unbalanced repeated measure, multi-center intervention study?

Though it won’t solve all the problems with this study, a continuous time correlation structure (e.g. AR1) and a continuous time mean function (e.g., a spline in time) will help.

1 Like

Frank, In regression model when including baseline measurement as a predictor how do we handle scenarios where few subjects are measured only once and no followup measurement whereas several other subjects were evaluated twice once at baseline and second measurement sometime in the future.


How should we handle this situation ?, Is it okay to do a LOCF (Last Obsv Carry Forward) for such cases where there are only baseline measurements and no followup ? Thanks in advance.

LOCF is almost never a good idea, and can’t be used unless you have at least 2 non-missing follow-up values. The observations without any follow-up data are only useful for unsupervised learning (data reduction) steps to help with high-dimensional predictors. They contribute almost no information. Multiple imputation will gain a slight benefit of having non-missing predictors but hardly worth the effort.

It is important to note that if you are trying to draw any conclusions from changes from pre to post, a pre-post design cannot withstand losing even a single observation because of missing follow-up. Pre-post designs are extremely brittle and non-response bias will ruin the analysis.

1 Like

Thank you Frank. I will plan accordingly. Probably start with only those with alteast two measurements.

That is likely to create a serious bias. Start with fitting a logistic model to predict the number of follow-up measurements based on all non-missing baseline data. And talk to the experts to see what features they think dictate a patient being followed.

1 Like

Great idea, did not think about model to predict the number of follow-up measurements, will do. Thanks as always.

Hi Prof Harrell,

I would like to ask regarding chapter 7.8.4 Bayesian Markov Semiparametric Model of the rms book.

I have difficulty understanding what contrast.rms does to a blrm object. I have understood it to be the estimated difference in means between two groups, here, 10000U-Placebo, similar to the emmeans::emmeans and modelbased::estimate_contrasts functions.

However, when I refer to the latter section, where the differences in mean is obtained from the unconditional probabilities, the values appear different.

How do I understand the difference in these two results?

Additionally, the contrasts are only for between-group differences at each timepoint. Would it be logical or meaningful to look at within-group differences, example, Week 4-Week 2, Week 8-Week 2 etc for 10000U and Placebo groups separately? Or would it be better to use conditional mean?

If we use conditional mean, the previous value, ptwstrs=40 needs to be specified. Is this value arbitrary?

Thank you for taking the time to read my questions!

The results have entirely different estimands. One relates to differences in log odds of transition probabilities and the other to differences in means (on the original scale) between two of the treatments at separate follow-up times.

It is possible to do that but it is not conventional when there are more than one follow-up times. We tend to look at between-group differences, slopes, curvature, etc.

Whenever estimating an absolute (not a transition odds, for example) you’ll need to define an initial state. It would be better to do this for 10 different initial values.

Thank you for clarifying!

If I understand this correctly, this is the same as log of transition odds ratio. Is there a way to code it directly in contrast so that I can get the transition odds ratio? I tried including the argument fun = exp like in logistic model, but it didn’t work.

Set funint=FALSE in contrast().