Internal model validation (Bootstrapping) and Multiple Imputation

What is recommended way to incorporate both bootstrapping (i.e. to compute optimism in the R-squared value) and multiple imputation?

For example, is it acceptable to:
(1) Perform multiple imputation to generate x number of datasets (e.g. x = 10)
(2) Fit a pre-specified regression model separately to each of the 10 imputed datasets and then run a separate bootstrap for each dataset
(3) Report the average and range of optimism values (e.g. for R-squared) from the 10 bootstraps.

Any guidance would greatly be appreciated!

You may have come across this (free online) book by Stefan van Buuren: https://stefvanbuuren.name/fimd/. The book discusses combining imputation and bootstrapping. Lots of good references therein.

Another paper I have in my ref manager but which I haven’t read yet: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.7654.

From what I recall, the imputation procedure should be nested within the bootstrap procedure to account for uncertainty properly. Something like the following (arbitrary numbers):

  1. Draw 500 bootstrap samples from the original dataset
  2. Within each bootstrap sample, multiply impute 10 datasets and get a pooled estimate for the quantity of interest. (In all, you’d generate 5,000 imputed datasets.)
  3. Use the distribution of 500 pooled estimates for inference.
1 Like