Internal model validation (Bootstrapping) and Multiple Imputation

What is recommended way to incorporate both bootstrapping (i.e. to compute optimism in the R-squared value) and multiple imputation?

For example, is it acceptable to:
(1) Perform multiple imputation to generate x number of datasets (e.g. x = 10)
(2) Fit a pre-specified regression model separately to each of the 10 imputed datasets and then run a separate bootstrap for each dataset
(3) Report the average and range of optimism values (e.g. for R-squared) from the 10 bootstraps.

Any guidance would greatly be appreciated!

You may have come across this (free online) book by Stefan van Buuren: https://stefvanbuuren.name/fimd/. The book discusses combining imputation and bootstrapping. Lots of good references therein.

Another paper I have in my ref manager but which I haven’t read yet: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.7654.

From what I recall, the imputation procedure should be nested within the bootstrap procedure to account for uncertainty properly. Something like the following (arbitrary numbers):

  1. Draw 500 bootstrap samples from the original dataset
  2. Within each bootstrap sample, multiply impute 10 datasets and get a pooled estimate for the quantity of interest. (In all, you’d generate 5,000 imputed datasets.)
  3. Use the distribution of 500 pooled estimates for inference.
2 Likes

Jonathan Bartlett and Rachael Hughes published a paper last year. Simulations showed that “Imputation followed by bootstrapping generally does not result in valid variance estimates under uncongeniality or misspecification, whereas certain bootstrap followed by imputation methods do”.
see https://journals.sagepub.com/doi/full/10.1177/0962280220932189

1 Like