Dealing with missing outcome data in prediction models


I’m a statistician but not a missing-data expert. I’m advising some collaborators on employing multiple imputation to analyze a clinical neuroimaging study.

As an illustrative example, consider this depression clinical trial
a) Neuroimaging features from EEG or fMRI data are available at baseline
b) Clinical symptoms are collected pre and post therapeutic interventions.
c) Question: Scientists are interesting in using neuroimaging features to predict change in symptom scores.
d) The caveat is that there are missing outcomes post intervention due to drop-out which we plan to impute.
e) My suggestions to incorporate imputation are variations of the following

  • i) Use multiple imputation to impute outcomes several times
  • ii) For each imputed dataset, conduct machine learning/predictive modeling as usual (superlearning/stacked regression of multiple learners, high dimensional regression, etc…) to predict outcomes per imputation
  • iii) Combine predicted outcomes per imputation.

I can’t find seem to find any discussion in the missing data/medical statistics literature for combining predictions (as opposed to coefficient estimates) in this way. It strikes me as easier to pool predictions rather than to pool the model coefficients. Are there any suitable alternatives to pooling predictions? Do you see any problems?




have you had a look at Flexible imputation of missing data by van Buuren?

I’ve used MICE and Bayesian multi-level models just recently. Using, e.g., brms in R, might get you a bit on the way (then you can, of course, impute outcomes also, but at the end that might not be necessary I believe, since you will have a posterior).



In my view the project got started off on the wrong foot with regard to a far-reaching analytic decision, and the other problems you are having are secondary to that. Symptom scores are not the types of variables for which subtraction works very well, and there are so many problems with change scores in general that I recommend they never be computed much less used in an analysis. Details are in Section 14.4 of BBR. In the context of missing data, you make a change score missing if either of its components is missing. Analysis should be based on raw scores, adjusted for baseline, and usually done with an ordinal model because of the strange distributions of things like symptom scores. If you asked patients what matters to them it’s primarily where they end up and not how they got there.

Give that those problems are solved, the next issue is dealing with missing outcome measurements. As discussed in my RMS book and course notes, it is sometimes helpful to impute outcomes because it helps you in better imputing baseline variables, but after such imputation it is recommended that you drop the subject with imputed outcomes from the analysis at the end. Imputing outcomes does not help very much - it’s “too late”.

Another fundamental problem is whether you can assume that the outcomes are missing at random. This may only be justified as an assumption when you have administrative missings, i.e., follow-up is not obtained only on the late-enrolling patients. If dropout is due on the other hand to a worsening clinical condition, I don’t know how to handle this in an interpretable way other than to add such clinical failures to ordinal levels of the final symptom scale.



Thanks @f2harrell!

Symptoms are measured using questionnaires like HDRS and often considered primary outcome because they are considered reasonably reliable and sensitive to change.
I definitely share your concern about change scores and I have brought up these points as well. Unfortunately raw change scores and percent change from baseline has been the de facto way that psychiatry has evaluated treatments for depression and anxiety. I have advised collaborators to switch to mean difference scores (Bland-Altman style) as this way they won’t lose interpretation. The whole field defines a treatment responder as 50% change from baseline! This is definitely a bigger battle for the sub-field beyond this particular analysis!

Building predictive models for treatment response to find neuroimaging biomarkers is a big agenda in the field and it is going to happen one way or another. I’m only in a position to advise to make this a bit less wrong without having to develop new methodology or radically change the status quo.

We don’t have much missing data at baseline, so the main issue is the missing longitudinal outcomes. We can do a sensitivity analysis under both MAR and MNAR assumptions. Some trials like CATIE collected a lot of information about why dropout occurred but we probably don’t have enough information for MAR to hold. While some people may drop out due to condition worsening, it is quite likely that many who dropout don’t get worse but don’t benefit enough to stay in treatment.

I agree that interpretation for studies in this situation will be an issue whether we drop missing outcomes or impute them. We will likely analyze the data both ways. The main issue I wanted to make sure is figuring out the least wrong strategy for imputation.


1 Like


You’re welcome. Raw change scores and even worse % change from baseline are completely inappropriate for these types of scales. So I wouldn’t spend any time on complex issues until you get the simple stuff fixed.

1 Like


I’ve read the book, it is extremely useful!