RCT with missing follow-up outcomes: ANCOVA + MI vs Mixed-Effects Modelling?

I haven’t had much luck getting answers to my question on stackexchange so I’m posting it here. To avoid cross-posting, I am willing to remove my question on stackexchange.

In a 2-group RCT with one baseline and 2 follow-up assessment time-points, assume that
(i) follow-up outcomes are missing at random
(ii) relatively strong auxillary variables are available

My question is: Which is a better modelling approach?
(i) mixed-effects modelling (which treats baseline outcome as a covariate)
(ii) ANCOVA after multiple imputation at each follow-up timepoint

I have understood that method i (mixed-effects modelling) is generally sufficient without the need for imputation. But just how strong should the auxillary variable(s) be before method ii becomes a competitive approach?

Any guidance on this would be greatly appreciated.

1 Like

Welcome to datamethods Yong Hao.

The literature has concluded that avoiding imputation is best, in the context of full likelihood models (mixed models, GLS, Markov models, others). I would never use (ii) when Y is missing at random.

1 Like


In addition to Frank’s reply above, you might find a recent thread in this forum of interest:

Linear mixed model for 3 time points

There is a fair amount of discussion there regarding mixed effects models, specifically with two post-baseline timepoints.

1 Like

Dear Prof Harrell,
Thank you for making me feel welcome and for generously sharing your insights, as always! I have understood that likelihood-based analysis is often recommended as the primary analysis in randomized studies. However, as MI uses information in auxiliary variables to reduce bias and improve precision, my lingering concern relates to possible scenarios in which MI may be preferred over mixed-effects modeling. To assuage my concerns, I did a quick literature search and found the paper by Kontopantelis et al which shows that including a moderately-correlated outcome into the imputation only marginally improves the performance of MI.

Dear Marc,
Thank you for sharing the link and your codes!

Just want to add my thanks to the growing number of people who have heaped praises about Datamethods. Datamethods is truly a treasure trove of information with many helpful experts contributing their time and expertise.


You raise a good question. I think that if there is a surrogate outcome (or secondary outcome) that is not part of the main analysis and that is (1) correlated with the main outcome and (2) displays the same treatment effect then I could see MI gaining power over “use all available main outcome data” analysis.

Which approach is even an option depends on your estimand. Using a mixed model for repeated measures can only target a limited number of estimands (e.g. the hypothetical “as if everyone had completed treatment of the interventions assigned at randomisation”), while with MI you are a lot more flexible. On some cases you may wish to use a joint model for multiple outcomes (e.g. this paper I wrote with colleagues a while ago has a nice example where that it’s very important to do that or to do a joint MI: https://doi.org/10.1002/pst.1705)

One worry about the joint model is that it does not provide marginal treatment effects (treatment effect on one endpoint ignoring the other endpdoints). I think the treatment may even appear to be weak on two endpoints but strong on each one marginally.

Dear Marc,

Thank you once again for alerting me to this helpful post. If I have understood what you have written, your default lme model specification is

follow-up y ~ group*y0  +   time*treat  +  (1| id)

whilst Jorge’s M4 (which you seem to endorse) is

follow-up y ~ time*y0  +   time*treat  +    (1| id)

In M4, an interaction between baseline outcome and time is specified presumably because the associations between baseline and follow-up outcomes wane over time.

Could you kindly clarify on the rationale for including group*y0? Are we assuming that the between-group differences vary as a function of baseline outcome?

Thank you in advance for your guidance.

Hi puayonghao,

I am scratching my head a bit on going back to review that thread from several months ago, and the discussion on that particular point. Jorge had asked about a preference between M3 and M4 in his original post, and it is possible that I mis-read his M4 as being:

M4: y ~ treat * y0 + time * treat + (1 | id)

where I mis-read the first interaction term as using ‘treat’ (treat * y0) instead of ‘time’ (time * y0), where the former is consistent with the model that I use by default.

That being said, yes, the use of “Group * T1” in my formula, which expands to “Group + T1 + Group:T1”, is to enable a consideration for the presence of an interaction between the treatment group and the baseline measurement.

That is, the slope of the change over time for each treatment group is different, predicated upon the baseline value, rather than presuming a fixed marginal treatment effect over time, where the lines are parallel to each other. Those lines may even intersect over the range of baseline values, thus reversing the direction of the treatment effect at one end of the range versus the other.


Dear Marc,

Thank you for the detailed clarification! It seems sensible not to assume a fixed treatment effect across the different baseline values, and I am wondering if specifying this interaction is akin to assessing differential treatment effects? If so, does one run into statistical power problems?

Hi puayonghao,

Any time you add more covariate degrees of freedom to the model, via additional covariates, adding interaction terms, adding regression splines, more complicated random effects, etc, there will be an impact on effective power and therefore, sample size requirements, all else being the same.

If you are applying the model on a post hoc basis to an existing cohort, you may not have a sufficient sample size (power) to assess more complicated models, and there are various issues and limitations to consider there, since you may be effectively overfitting the model.

If you are designing a new, prospective study, and you want to conduct power/sample size assessments where you are perhaps using a mixed effects model as your primary outcome analysis method, and you are using lme4 based models in R, you can use the “simr” CRAN package as one possible option:

which provides for Monte Carlo simulations within the lme4 based model framework.

In terms of differential treatment effects, the use of interaction terms can be part of the process to assess those. There are numerous papers and guideline documents on the subject, and I might point you to a relatively recent publication:

and where the first several references therein are also good additional resources to review.


Thank you for the very helpful clarification and for pointing me to useful resources!