Saying it invalidates the findings sounds like being too hard on the paper. Their approach may not be the best, but that doesn’t mean it’s not valid. I sent you a DM @FurlanLeo
Edit: just noticed the Q was addressed to FH, sorry for stepping in
Hi Dr. Harrell, following up after the RMS 2025 course. During our discussion of random intercepts in Ch. 7, I recall you mentioning that adding a high number of random intercepts to the model was problematic because it would drastically inflate the df of the model. A while ago, I worked through a stackexchange thread (https://stats.stackexchange.com/questions/242759/calculate-random-effect-predictions-manually-for-a-linear-mixed-model) where random intercepts are computed as a function of the residual variance, variance of the random intercept, and the residuals from the fixed effect component. My reading of this is that there is basically only one additional parameter being estimated for the random intercept, am I misunderstanding the computation of random intercepts and the impact on model parsimony?
Only one variance parameter is estimated but n random effects are estimated where n is the number of subjects. When the variance is not large, the effective number of random effects is less than n due to their shrinkage. Expect a large number of effective d.f. and convergence problems when within-subject correlations are very high and there is a diversity across subjects.
This question is regarding the choice of time variable—whether to model time at a finer resolution (days) or at a higher level (years)..
I am analyzing data for a study where the goal is to analyze the duration of medication use before and after intervention, how the trend in medication use changed before and after intervention.
I am not in favor of using time as years (1 or 2) before or after intervention. I prefer using
time in days(-680, -661, -258, 534 etc…
I am confident that using time in days is the right approach, I like some help understanding
a) why time(days) is better than time(years) also
b) if using time(days) would change the interpretation in anyway ?
Years is OK as long as it is year + fraction of a year. Otherwise you would be losing too much resolution / information and in some cases not be able to tell which even occurred before some other event.
Blood pressure studies never categorize blood pressure. You can treat continuous blood pressure as ordinal to avoid distributional assumptions and add robustness.
There is no amount of noise that makes categorization work better than using the raw noisy data. I have a simple interactive simulation program demonstrating this. As Cohen (of Cohen’s d) said, categorization turns a quantitative error into a qualitative error. A measurement error can make the category be off by 100%.