I haven’t had much luck getting answers to my question on stackexchange so I’m posting it here. To avoid cross-posting, I am willing to remove my question on stackexchange.
In a 2-group RCT with one baseline and 2 follow-up assessment time-points, assume that
(i) follow-up outcomes are missing at random
(ii) relatively strong auxillary variables are available
My question is: Which is a better modelling approach?
(i) mixed-effects modelling (which treats baseline outcome as a covariate)
(ii) ANCOVA after multiple imputation at each follow-up timepoint
I have understood that method i (mixed-effects modelling) is generally sufficient without the need for imputation. But just how strong should the auxillary variable(s) be before method ii becomes a competitive approach?
Any guidance on this would be greatly appreciated.
datamethods Yong Hao.
The literature has concluded that avoiding imputation is best, in the context of full likelihood models (mixed models, GLS, Markov models, others). I would never use (ii) when Y is missing at random.
In addition to Frank’s reply above, you might find a recent thread in this forum of interest:
Linear mixed model for 3 time points
There is a fair amount of discussion there regarding mixed effects models, specifically with two post-baseline timepoints.
Dear Prof Harrell,
Thank you for making me feel welcome and for generously sharing your insights, as always! I have understood that likelihood-based analysis is often recommended as the primary analysis in randomized studies. However, as MI uses information in auxiliary variables to reduce bias and improve precision, my lingering concern relates to possible scenarios in which MI may be preferred over mixed-effects modeling. To assuage my concerns, I did a quick literature search and found the paper by Kontopantelis et al which shows that including a moderately-correlated outcome into the imputation only marginally improves the performance of MI.
Thank you for sharing the link and your codes!
Just want to add my thanks to the growing number of people who have heaped praises about
Datamethods is truly a treasure trove of information with many helpful experts contributing their time and expertise.
You raise a good question. I think that if there is a surrogate outcome (or secondary outcome) that is not part of the main analysis and that is (1) correlated with the main outcome and (2) displays the same treatment effect then I could see MI gaining power over “use all available main outcome data” analysis.
Which approach is even an option depends on your estimand. Using a mixed model for repeated measures can only target a limited number of estimands (e.g. the hypothetical “as if everyone had completed treatment of the interventions assigned at randomisation”), while with MI you are a lot more flexible. On some cases you may wish to use a joint model for multiple outcomes (e.g. this paper I wrote with colleagues a while ago has a nice example where that it’s very important to do that or to do a joint MI: https://doi.org/10.1002/pst.1705)
One worry about the joint model is that it does not provide marginal treatment effects (treatment effect on one endpoint ignoring the other endpdoints). I think the treatment may even appear to be weak on two endpoints but strong on each one marginally.