I have been following Frank’s discussions on the use of change as a dependent variable in regression models.

I have come across trialists using change from baseline as a dependent variable in mixed models, specifically mixed models for repeat measurements (MMRM). Considering the complexity of this model – even for modeling actual Y_ijk responses as dependent variables-- I have been wondering the implications of performing analysis with change-from-baseline as dependent variables and whether inferences obtained can be valid. Are there some insights into this issue, specifically on the impact of this practice in MMRMs?

mine will not be the best answer but i would just say that Senn (statistical issues in drug development) says if you adjust for baseline it makes no difference whether you use change or not. For MMRM it seems to me you induce a treatment x time interaction if baseline is included in the time variable, rather than as a covariate …

Senn’s comment pertains to linear models only. Nonlinear models (ordinal regression and others) don’t share this property. And many dependent variables don’t have the property that subtraction works as it should. For example, if Y is ordinal but not interval scaled (e.g., pain severity 5-point Likert scale), Y minus its baseline value Y0 is no longer ordinal. In addition, even for interval-scaled Y, the relationship between Y and Y0 may not be linear. In that case change scores can’t work. The general solution is to use raw Y as the repeated outcomes, adjusting for raw Y0 (using a spline function for protection against nonlinear effects). Details are in BBR Section 14.4. I personally don’t even compute Y-Y0 much less analyze it.

For the current discussion it matter little whether Y is multivariate (longitudinal) or univariate.

Best answer that I know of. Also, if you’re looking for a plain-language description, Frank’s paragraph in this article is my favorite and has been very helpful when trying to explain this to others: https://www.fharrell.com/post/errmed/#change

“The purpose of a parallel-group randomized clinical trial is to compare the parallel groups, not to compare a patient with herself at baseline. The central question is for two patients with the same pre measurement value of x, one given treatment A and the other treatment B, will the patients tend to have different post-treatment values? This is exactly what analysis of covariance assesses. Within-patient change is affected strongly by regression to the mean and measurement error. When the baseline value is one of the patient inclusion/exclusion criteria, the only meaningful change score requires one to have a second baseline measurement post patient qualification to cancel out much of the regression to the mean effect. It is he second baseline that would be subtracted from the follow-up measurement.”

In lieu of analyzing change from baseline (CfB), the recommendation to counter the effect of regression to the mean is to model the response measure per se adjusting for baseline and other covariates: properly relegating pretreatment information to the RHS.

I have been using Generalized Least Squares regression for this emulating Chapter 7 in RMS: Modeling Longitudinal Responses using Generalized Least Squares.

(A.) I am confused by a minor detail concerning how to refer to this method as expressed in RMS: Is GLS, or this application of it, a “marginal model”?—as so many texts refer to one of the options for analyzing longitudinal data. Is what is done in Chapter 7 of RMS the same thing that Diggle (2002) is referring to in section 1.5 as the first recommended option for longitudinal data analysis; and in Chapter 4? For some reason I remain uncertain of the connection between RMS and other sources. And I do not think that Frank Harrell ever refers to GLS for repeated measures as a “marginal model”—which perplexes me (although, this is my usual cognitive state).

(B.) Can what is recommended and demonstrated in Chapter 7 of RMS be done in SAS?. I have simulated data and prototyped the analysis per Chapter 7 of RMS for a client in R using the wonderful rms library. But the clients statistical group will only use SAS and insists on wanting to analyze the proposed study in random effects/mixed models, because that is how they know how to handle correlated data in SAS. (This is an example of the common practice distorting the analysis and the study objectives to accommodate the limitation of the software). I think this is not appropriate or efficient.

(C.) If the methods of RMS Chapter 7 can be done in SAS, can anyone describe how, please, so I can communicate this to the clients statisticians. Any examples to refer them to.

*I might add that part of the challenge here is that this is a single arm cohort study involving longitudinal follow up after treatment. So unlike much of the literature regarding Before-After comparisons with concurrent controls, this is an uncontrolled design. Cannot subtract out regression to the mean: can only mitigate it in analysis and interpretation.

Random effects models are in some sense more complicated that multivariate nornal GLS models, and you can fit GLS with SAS (PROC GENMOD?), so it’s difficult to understand Drew’s clients.

or even proc glm with a repeated statement i guess. although i would do this when visits are scheduled. when visits are haphazard i tend to use random effects

I am seeing that random effect models are more versatile than models such as GLS/multivariate normality. I’m working with Ben Goodrich on a Bayesian longitudinal model for random effects (subject intercepts) and a serial correlation structure within subject.