Internal model validation (Bootstrapping) and Multiple Imputation

What is recommended way to incorporate both bootstrapping (i.e. to compute optimism in the R-squared value) and multiple imputation?

For example, is it acceptable to:
(1) Perform multiple imputation to generate x number of datasets (e.g. x = 10)
(2) Fit a pre-specified regression model separately to each of the 10 imputed datasets and then run a separate bootstrap for each dataset
(3) Report the average and range of optimism values (e.g. for R-squared) from the 10 bootstraps.

Any guidance would greatly be appreciated!

You may have come across this (free online) book by Stefan van Buuren: The book discusses combining imputation and bootstrapping. Lots of good references therein.

Another paper I have in my ref manager but which I haven’t read yet:

From what I recall, the imputation procedure should be nested within the bootstrap procedure to account for uncertainty properly. Something like the following (arbitrary numbers):

  1. Draw 500 bootstrap samples from the original dataset
  2. Within each bootstrap sample, multiply impute 10 datasets and get a pooled estimate for the quantity of interest. (In all, you’d generate 5,000 imputed datasets.)
  3. Use the distribution of 500 pooled estimates for inference.

Jonathan Bartlett and Rachael Hughes published a paper last year. Simulations showed that “Imputation followed by bootstrapping generally does not result in valid variance estimates under uncongeniality or misspecification, whereas certain bootstrap followed by imputation methods do”.

1 Like