Internal validation: Imputation nested within bootstrap

pravleen · May 9, 2022, 7:20am

Background: I would like to internally validate my prediction model using bootstrapping. I would also like to use multiple imputation to deal with missing data and will be nesting the MI within each bootstrap sample. The main model is a proportional hazards regression.

Question: For testing performance, should I test each bootstrap model on the original incomplete data or should I also create a separate “original” imputed dataset that has nothing to do with the bootstrap samples? I saw the latter in Wahl et. al., 2016 but haven’t seen any other examples.

Sidenote - if there are any SAS users who have done MI nested in bootstrap, I would love to see what procedures you used. I’m using surveyselect and MI but am not confident whether MIANALYZE is the right procedure to pool my analysis results.

f2harrell · May 9, 2022, 11:22am

Please read https://discourse.datamethods.org/t/rms-describing-resampling-validating-and-simplifying-the-model/ and move your post (if it’s not already answered by what’s there) to the bottom of that page as a “reply” to the last post there. Thanks.

pravleen · May 9, 2022, 7:19pm

For anyone who stumbles into this, I found the answer on page 4-48 of Dr. Harrell’s Regression Modeling Strategies and in the replies of the page linked by him above. I will multiply impute the original (incomplete) data and create my “final model” using that. For internal validation, I will create bootstrap models (draw B bootstrap samples from original incomplete data > multiply impute each > create B bootstrap models) and test each on the original (multiply imputed) data. I know we’re supposed to repeat each modeling step when running bootstraps so I believe nesting the MI in bootstrap makes sense (and is how Wahl 2016 does it).