I have a model built using multiple imputation on one dataset (let’s call it “old_dat”). I am looking to make predictions on a new dataset from this model (“new_dat”) to assess whether there was balance between 2 treatment arms in terms of prognostic factors. old_dat contained no interactions. How should I handle the missing data in this new cohort? I can think of 3 options:
- Impute median values of missing covariates since I am only looking to assess balance between the treatment arms and therefore am not interested in calibration so much as discrimination
- Create new imputation model using new_dat alone and then make predictions from old_dat onto new_dat
- Stack old_dat and new_dat, make new imputation model to impute missing new_dat values, but then still use model originally trained on old_dat to make predictions on new_dat?
Which do you think is preferred (if any)?