Internal validation: Imputation nested within bootstrap

Background: I would like to internally validate my prediction model using bootstrapping. I would also like to use multiple imputation to deal with missing data and will be nesting the MI within each bootstrap sample. The main model is a proportional hazards regression.

Question: For testing performance, should I test each bootstrap model on the original incomplete data or should I also create a separate “original” imputed dataset that has nothing to do with the bootstrap samples? I saw the latter in Wahl et. al., 2016 but haven’t seen any other examples.

Sidenote - if there are any SAS users who have done MI nested in bootstrap, I would love to see what procedures you used. I’m using surveyselect and MI but am not confident whether MIANALYZE is the right procedure to pool my analysis results.

Please read https://discourse.datamethods.org/t/rms-describing-resampling-validating-and-simplifying-the-model/ and move your post (if it’s not already answered by what’s there) to the bottom of that page as a “reply” to the last post there. Thanks.

1 Like

For anyone who stumbles into this, I found the answer on page 4-48 of Dr. Harrell’s Regression Modeling Strategies and in the replies of the page linked by him above. I will multiply impute the original (incomplete) data and create my “final model” using that. For internal validation, I will create bootstrap models (draw B bootstrap samples from original incomplete data > multiply impute each > create B bootstrap models) and test each on the original (multiply imputed) data. I know we’re supposed to repeat each modeling step when running bootstraps so I believe nesting the MI in bootstrap makes sense (and is how Wahl 2016 does it).

1 Like