Linear mixed model for 3 time points


Which mixed-model would you recommend for a RCT, with 2 groups, in which participants are measured at baseline, at the end of the intervention (3-month), and finally at follow-up (4-month).

The purpose is to determine the effect of the intervention at 3-month and 4-month.

M1*: y ~ time * treat + (1 | id)

M2: y ~ time * treat + (1 | time + id)

M3: y ~ y0 + treat + time * treat + (1 | id)

M4: y ~ time*y0 + time * treat + (1 | id)

M5: y ~ time*treatment + (1+ time | id)

Another one? And why?

Thanks in advance,

1 Like

I don’t think random effects work well with only 2 time points. It may even be mathematically impossible to fit such a model. And the correlation pattern assumed by having random intercepts is unlikely to match the characteristics of the data although you wouldn’t see that with only 2 points. I’d favor a Markov process instead.

To this point specifically (the first sentence), see this answer on Stack Exchange for some simulation results for a LMM. The author states:

It appears (for this example, anyway) that the variance doesn’t really stabilize until there are at least 14 subjects, if not later.

However, the accepted answer links to Ben Bolker’s GLMM page which suggests at least 5 or 6 levels for a given random effect, and the next highest answer quotes Gelman & Hill as saying a random effect with fewer than 5 levels won’t add much over a standard LM. So, one of your models in which time is not a random effect would work (assuming id refers to patients, and you have a sufficient number of patients for this to be sensible).

Regarding @f2harrell’s second sentence, it looks like it is mathematically possible, or at least the code will execute (perhaps the function would convert a random effect with 2 levels into a fixed effect) but I’ve never tried it. The second reference below ran simulations with as low as 2 levels.

Other Resources

When I last looked into this topic (~Fall 2021), most of what I found was specific to ecological research. But there are a few papers that you might find useful (and the references therein, of course):

  1. Gomes, D.G., 2022. “Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model?”. PeerJ, 10, e12794. – the nice thing here is that you can also read the article’s peer review history (see the rightmost column). The author writes in the abstract:

    Thus, it may be acceptable to use fewer than five levels of random effects if one is not interested in making inferences about the random effects terms (i.e . when they are ‘nuisance’ parameters used to group non-independent data), but further work is needed to explore alternative scenarios.

  2. Oberpriller, J., de Souza Leite, M. and Pichler, M., 2021. “Fixed or random? On the reliability of mixed-effects models for a small number of levels in grouping variables.” bioRxiv. – they show that fewer levels → lower power

  3. Ben Bolker’s GLMM FAQ page – this was linked to in the Stack Exchange answer mentioned above, but it is worth reading the answer to this question in its entirety. There is a lot of great information and other resources on the rest of the page; for example, this r-sig-mixed-models post (and subsequent posts in the same thread) is also relevant to your question.

  4. Some other simulations from Ben Bolker’s RPubs page, although this is from 2013.

  5. A similar Stack Exchange post to the one above that is more recent

I know this is a long post but you might need to know this in the future!

1 Like

While my previous post didn’t directly answer your question—it was related to the issue of specifying variables as random vs. fixed in the presence of only a few levels—since you list several models with different variables and interaction terms, I thought I would provide other references that address model specification/structure.

  1. See the answers to this Stack Exchange question which asks for the definitions of fixed and random effects. As the top answer shows, it is not completely trivial. You may want to read the other answers, as well.

  2. This Stack Exchange post asks about correct model specification. There are only 2 answers and I recommend reading both and checking out the links they provide.

  3. I again link to Ben Bolker’s GLMM FAQ page, but this time to the question of model specification.

  4. Since your models M2 and M5 include a random slope (for time), this Stack Exchange answer might be useful for you.

  5. This Stack Exchange posts asks about including random slopes in the model.

  6. Heisig, J.P. and Schaeffer, M., 2019. Why you should always include a random slope for the lower-level variable involved in a cross-level interaction. European Sociological Review, 35(2), 258-279. – this might not be directly relevant to you, but it is concerned with specification of the random effects structure (random slopes in particular).

There are of course other recommendations such as the “keep it maximal” recommendation, but I don’t think there is consensus. As you can see, this is a pretty complicated area of statistics (in my opinion) and it is easy to get overwhelmed by all the information out there. But the mixed-model tag (and other related tags) on Stack Exchange has a lot of great questions and answers.


This is exceptionally helpful information. The remaining information that would be helpful IMHO is an equation that gives us the standard error of the log of the random effects variance as a function of n = # subjects and m = # measurements per subject. I gather than when n is large enough you can get away with smaller m.


Presuming that @JorgeTeixeira takes the recommended approach of not using the baseline observation as a record in the per patient cluster, and only uses the baseline value as a covariate, there will be 2 post-baseline observations (records) per patient (cluster), as Frank references in his reply. That is, there will not be 3 records per patient, one per time point including baseline, but only two.

Thus, presuming a reasonable sample size in the study being referenced, there should not be an issue with the number of clusters (patients). The issue however, will be the typical cluster size of 2, presuming that most patients have the two post-baseline observations. You can get away with some number of single observation clusters, that is only one post-baseline observation for a patient at either time point, but generally, you don’t want that to be a material number of patients, for some definition of “material”.

You may also have issues with any patients that only have the baseline observation and do not have either of the two post-baseline observations, which would leave you with no effective observations (records) for that patient given this approach. That may depend upon the ITT cohort inclusion criteria and whether you plan any imputation of missing data.

With the small cluster size of 2, the likelihood is that you can use random intercepts in the random effects specification, but you are likely to have model convergence issues if you also specify random slopes. You can try, but if you get convergence errors, remove the random slope specifications and retain the random intercepts.

When I engage in these analyses, I typically use R’s older lme() function rather than lme4(), as I don’t need the more complex relationships (e.g. crossed effects) that lme4() supports. It is also easier to specify AR1 correlations using the older, and historically stable, lme().

Thus, my default lme() model specification typically looks like:

MOD <- lme(Response ~ Group + Time + Group:Time + T1 + Group:T1,
           random = ~1 | ID,
           data = DF.MOD, na.action = na.omit,
           correlation = corAR1(form = ~ as.numeric(Time) | ID))

Where Response is the continuous response variable of interest, Group is the treatment arm factor variable, Time is the time based factor variable, T1 is the baseline value of the variable of interest, and ID is the unique patient identifier. DF.MOD is the source data frame for the observations, structured as required.

Note that I use an AR1 correlation specification, as I do not presume compound symmetry over time. With only two post-baseline observations, an AR1 specification may not make sense here. Note that the Time variable, which is a factor, needs to be coerced to numeric for the AR1 specification.

Once I have the model set up, I then use Russ Lenth’s “emmeans” CRAN package to generate various contrasts for both within and between group estimation.

There are alternatives that can be used. Frank mentioned one, and either a GLS or GEE approach may also be apropos, given a desire to focus on marginal effects.

1 Like

Thanks @cwatson and @f2harrell . I will read those links and consider your suggestions and then I might come back later.

Thank you @MSchwartz,

  1. Do you think it is consensual that y0 should be a predictor? I keep listening to conflicting recommendations here.

  2. I have never seen before a model as you have mentioned. Can you explain the rationale behind of using " Group:T1"

  3. Of those I have listed, I presume you would opt for M3 or M4?

Thank you.

@JorgeTeixeira see inline below:

I am not sure that there is complete consensus on anything in statistics.

That being said, the use of the baseline measurement as a covariate, that is, as an “adjustment” for any baseline imbalance between the groups, even in a randomized study, is, in my mind, the preferred approach and there have been numerous publications on this topic. In essence, in a repeated measures setting, it is an extension of the traditional ANCOVA approach, where the usual OLS model formulation would be:

FU ~ Group * Baseline

Where FU is the single follow up measurement (one record per patient), Group is the treatment group factor and Baseline is the baseline measurement.

There is a recent thread regarding the use of change from baseline in this forum that raises related issues and has numerous references to the use of ANCOVA:

Changing the “Status Quo” analyses for Regulatory Submission - Change from Baseline

There is one reference listed therein:

that you may find of particular interest in the context of a repeated measures setting, such as you have.

As per the ANCOVA formulation above, it is the explicitly stated interaction term between the treatment group and the baseline measurement. Rather than using, in R syntax, the more concise:

Group * Baseline

which embodies both the main effects and the interaction terms, it is using the longer syntax of:

Group + Basline + Group:Baseline

to more explicitly express the model terms.

My own preference would be M4, for the reasons we have been discussing, where, as I noted in my reply, you have two, not three, records per patient, given the timing of your observations.

Ahaha True! Thank you again @MSchwartz . Just one final question, if possible.

M4: y ~ time*y0 + time * treat + (1 | id): building upon this model, if you have used stratified randomization (let’s call them variables V and S), and thus, you decide to include them in the model.

Would y ~ time*y0 + time * treat + V0 + S0 + (1 | id) be the correct model? Or do you also need to set an interaction of V0 and S0 with time?



The general guideline is to “analyze as you randomize”. Thus, yes, you would want to include the randomization stratification factors as covariates in the model.

At minimum, I would assess the interaction terms with the stratification factors and either make the a priori decision to keep them in the model under any circumstances, or perhaps, as part of a model reduction process, consider removing them if the p values for the interaction terms are above some “softer” threshold such as p >= 0.2 or similar. In either case, retain the main effects terms for the stratification factors.

Part of the challenge may be do you have a sufficiently large sample size in your study to be able to support the covariate degrees of freedom to include all of the covariates and their interaction terms and still end up with a stable model.

That is a judgement call, and if you are going to set up an SAP with these analyses pre-specified, you may want to consider how you approach the modeling process, model validation, and what, if any, conditional decisions you might make once you have the data in front of you so that they are documented a priori.

Wow what incredibly useful information Marc.

I have summarized all the reasons for using y0 as baseline in Chapter 7 of the RMS book and course notes. The single most compelling reason is that y0 is frequently a selection criterion for the trial, requiring you to use a multivariable approach that doesn’t really exist, i.e., to handle a mixture of truncated and non-truncated response variable distributions.

1 Like

The widely used industry standard (in the pharma industry) here is a model with

  • visit being a categorical variable (you really want to allow for different time trends in the control group and a simple linear trends certainly are very unusual,
  • a treatment main effect,
  • a visit by treatment interaction (allows for different treatment effects at different visits),
  • baseline as a covariate*,
  • a visit by baseline interaction (the influence of the baseline usually declines over time),
  • an unstructured covariance structure** between the random effects vectors (=1 component of the vector per visit) for each patient.

This model is usually what people mean when they say MMRM (Mixed Model for Repeated Measures).

* Regarding baseline as a covariate rather than treating it as yet another observation: This choice is very clear, if inclusion criteria are applied on the baseline. In that case, you tend to get horrible model behavior, if you included the baseline observation in your model as just another observation, because you “wrongly” treat that “selection-biased-truncated” distribution at baseline like it’s a normal (we’ve seen this in several simulation studies, this tends to cause quite bad issues). If it’s really decently behaved (i.e. no inclusion criteria on it, no weird skewing due to inclusion criteria on other correlation variables etc.), then doing so might be okay and might have some advantages in terms of avoiding explicit imputation of missing baseline values. However, then you’d have to decide what to do about making sure you don’t assume a treatment effect at baseline (which would of course be silly). E.g. you could code treat = “control group” for the baseline records.

** Admittedly, if you just have two post-baseline visits just having a random effect also works (implies a compound symmetric correlation structure, which in this case would be equivalent to a fully unstructured one), but in general you don’t want to impose too strong an assumption on the correlation structure (compound symmetric tends to be a less disasterous assumption than assuming, say, AR(1) which tends to lead to horribly underestimated standard errors).

In R you could do it like this using lme4 (while how to do this with SAS with PROC MIXED is, of course, common knowledge):

lmer(y ~ treatment + visit + baseline + treatment:visit + visit:baseline (0 + visit | patient_id), ...,
     control = lmerControl(check.nobs.vs.nRE = “ignore”))

where visit, treatment and patient_id are factors. That trick for implementing it in R is based on a presentation on “Implementing Mixed Models with Repeated Measures (MMRM) in R” by Daniel Sabanés Bové, which I think you can find somewhere online, if you search for it.


In reply to @Bjoern, for convenience, here is a link to the slides for the Bové presentation that he references above:

and there is a video here:

The general framework that he references parallels the implementation using lme() that I have above, also pointing out the issues surrounding the correlation structure in the case of a limited number of post-baseline observations.

As I noted, it is easier to specify correlation structures using lme(), but Ben Bolker, the active maintainer of the lme4 CRAN package, has some notes on implementing these for lmer() here:


Because visit windows can be fairly wide, the analysis works better if you treat follow-up time as a continuous variable in days and not use visit number. Because of that you can’t use an unstructured covariance matrix in general. Current practice needs to change.

What do you do with randomised individuals who only have a baseline measurement (i.e., all post-baseline measures are missing)?

Some form of multiple imputation that is aligned with the desired estimand is the most obvious approach.

That is - at least to me - debatable. If we are talking about visits every 3 months and visits windows are ± 3 days, then I’d rather avoid simplistic assumptions on the covariance structure. Of course, we could debate whether with, say, visits every month and a visit window of 7 days (that would be an unusually wide window from my experience though) the days are truly a bit too far apart. However, surely then we’d want to not have a linear function of time in the regression equation, but rather say a spline? You could still assume an unstructured covariance matrix then, I’m not clear on what would prevent that.

Multiple imputation cannot recover any information about Y unless there is also a surrogate or future measurement of Y to impute from.

Yes — time should be considered as continuous if the window is longer than, say, 1 week. And the time trend should routinely be assumed to not be linear (just smooth).

An unstructured covariance matrix will be inefficient if the number of distinct time points exceeds about 4.