Internal model validation (Bootstrapping) and Multiple Imputation

TJ61 · March 31, 2021, 8:49pm

What is recommended way to incorporate both bootstrapping (i.e. to compute optimism in the R-squared value) and multiple imputation?

For example, is it acceptable to:
(1) Perform multiple imputation to generate x number of datasets (e.g. x = 10)
(2) Fit a pre-specified regression model separately to each of the 10 imputed datasets and then run a separate bootstrap for each dataset
(3) Report the average and range of optimism values (e.g. for R-squared) from the 10 bootstraps.

Any guidance would greatly be appreciated!

jrgant · April 8, 2021, 2:42pm

You may have come across this (free online) book by Stefan van Buuren: https://stefvanbuuren.name/fimd/. The book discusses combining imputation and bootstrapping. Lots of good references therein.

Another paper I have in my ref manager but which I haven’t read yet: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.7654.

From what I recall, the imputation procedure should be nested within the bootstrap procedure to account for uncertainty properly. Something like the following (arbitrary numbers):

Draw 500 bootstrap samples from the original dataset
Within each bootstrap sample, multiply impute 10 datasets and get a pooled estimate for the quantity of interest. (In all, you’d generate 5,000 imputed datasets.)
Use the distribution of 500 pooled estimates for inference.

Dan · May 7, 2021, 9:39am

Jonathan Bartlett and Rachael Hughes published a paper last year. Simulations showed that “Imputation followed by bootstrapping generally does not result in valid variance estimates under uncongeniality or misspecification, whereas certain bootstrap followed by imputation methods do”.
see https://journals.sagepub.com/doi/full/10.1177/0962280220932189

soumyajitroy8 · August 3, 2021, 3:26am

Hi,
Can somebody help me with how to interpret the internal validation results in rms?
For example,

validate(cox.tox.3, method = “boot”, B = 500)
index.orig training test optimism index.corrected n
Dxy 0.3221 0.3378 0.3151 0.0227 0.2994 500
R2 0.2821 0.3020 0.2601 0.0419 0.2402 500
Slope 1.0000 1.0000 0.8747 0.1253 0.8747 500
D 0.0370 0.0403 0.0336 0.0067 0.0303 500
U -0.0006 -0.0006 0.0016 -0.0022 0.0016 500
Q 0.0376 0.0409 0.0320 0.0089 0.0286 500
g 0.8315 0.8728 0.7634 0.1094 0.7221 500