RMS Modeling Longitudinal Responses

FurlanLeo · June 16, 2024, 1:33pm

Great, thank you.

I’m critically appraising this article for a systematic review using the CHAMP checklist https://bjsm.bmj.com/content/55/18/1009.2.

My concern is that what the authors of this study did invalidates their analysis/results, according to items 8 and 9 from CHAMP.

I wonder whether that’s a fair conclusion from my part. Or perhaps I could be more lenient…

What’s your take on this, Professor @f2harrell?

martinspn · June 16, 2024, 3:36pm

Saying it invalidates the findings sounds like being too hard on the paper. Their approach may not be the best, but that doesn’t mean it’s not valid. I sent you a DM @FurlanLeo

Edit: just noticed the Q was addressed to FH, sorry for stepping in

f2harrell · June 17, 2024, 2:06pm

I’m glad you stepped in, and I concur.

nicksun · May 21, 2025, 10:13pm

Hi Dr. Harrell, following up after the RMS 2025 course. During our discussion of random intercepts in Ch. 7, I recall you mentioning that adding a high number of random intercepts to the model was problematic because it would drastically inflate the df of the model. A while ago, I worked through a stackexchange thread (https://stats.stackexchange.com/questions/242759/calculate-random-effect-predictions-manually-for-a-linear-mixed-model) where random intercepts are computed as a function of the residual variance, variance of the random intercept, and the residuals from the fixed effect component. My reading of this is that there is basically only one additional parameter being estimated for the random intercept, am I misunderstanding the computation of random intercepts and the impact on model parsimony?

f2harrell · May 22, 2025, 6:25pm

Only one variance parameter is estimated but n random effects are estimated where n is the number of subjects. When the variance is not large, the effective number of random effects is less than n due to their shrinkage. Expect a large number of effective d.f. and convergence problems when within-subject correlations are very high and there is a diversity across subjects.

CP3 · April 24, 2026, 4:43pm

This question is regarding the choice of time variable—whether to model time at a finer resolution (days) or at a higher level (years)..

I am analyzing data for a study where the goal is to analyze the duration of medication use before and after intervention, how the trend in medication use changed before and after intervention.

the data structure is as follows

ID ChildDOB Duration (medication use)
101 2018-04-21 2
105 2018-05-10 4
206 2019-07-02 1
107 2019-06-17 10
201 2019-06-17 9
.
.
.
.
103 2021-02-15 5
210 2021-08-17 3
203 2021-09-30 2
215 2021-10-07 1

What I am proposing is to use time as days before and after intervention

ID ChildDOB Duration (medication use) time(days before/after intervention)
101 2018-04-21 2 -680
105 2018-05-10 4 -661
206 2019-07-02 1 -243
107 2019-06-17 10 -258
201 2019-06-17 9 -258
.
.
.
.
103 2021-02-15 5 351
210 2021-08-17 3 534
203 2021-09-30 2 578
215 2021-10-07 1 585

There is a suggestion that we use time in years from intervention instead of days

ID    ChildDOB   Duration (medication use)      time(year before/after intervention)
101   2018-04-21  2                                         -2
105   2018-05-10  4                                         -2
206   2019-07-02  1                                         -1
107   2019-06-17  10                                        -1
201   2019-06-17  9                                         -1
.
.
.
.
103   2021-02-15  5                                         1
210   2021-08-17  3                                         2
203   2021-09-30  2                                         2
215   2021-10-07  1                                         2

I am not in favor of using time as years (1 or 2) before or after intervention. I prefer using
time in days(-680, -661, -258, 534 etc…

I am confident that using time in days is the right approach, I like some help understanding
a) why time(days) is better than time(years) also
b) if using time(days) would change the interpretation in anyway ?

Thanks.

f2harrell · April 25, 2026, 1:29pm

Years is OK as long as it is year + fraction of a year. Otherwise you would be losing too much resolution / information and in some cases not be able to tell which even occurred before some other event.

CP3 · April 25, 2026, 3:15pm

Thank you Dr.Harrell.

f2harrell · April 26, 2026, 3:36pm

We express thanks with “likes” here, but thanks.

Uriah · April 29, 2026, 3:55pm

Does it make sense to think of bp as an ordinal-longitudinal outcome?

Something like this:

f2harrell · April 29, 2026, 4:33pm

Blood pressure studies never categorize blood pressure. You can treat continuous blood pressure as ordinal to avoid distributional assumptions and add robustness.

Uriah · April 30, 2026, 6:37am

Another argument I had in mind for ordinal framework for BP is the noisy nature of the measurement.

Johannes_Schwenke · April 30, 2026, 9:54am

Does categorising a noisy Y actually make measurement error less of a problem? Would surprise me.

Uriah · April 30, 2026, 11:53am

It will just make me feel more comfortable with information loss.

f2harrell · May 2, 2026, 12:24pm

There is no amount of noise that makes categorization work better than using the raw noisy data. I have a simple interactive simulation program demonstrating this. As Cohen (of Cohen’s d) said, categorization turns a quantitative error into a qualitative error. A measurement error can make the category be off by 100%.