 # Using the double bootstrap for validating prediction models

I have a few questions about using the double bootstrap for the purposes of validating prediction model performance.

1. When is the double bootstrap more preferable than the basic bootstrap?
2. What is a useful definition of the double bootstrap algorithm?

The double bootstrap seems suitable for two cases: 1) when a bias adjusted estimate of optimism is desired or 2) to both select from competing models and summarize the performance of the selected model (in the manner of nested k-fold cross validation).

Depending on those cases, my best guess at the appropriate double bootstrap algorithms are the following.

Double bootstrap for bias adjusted optimism correction:

1. Fit the model to the original data and calculate the apparent performance estimate S_{app}.
2. For k = 1, ..., K
1. Generate a bootstrapped dataset (with replacement) from the original data.
2. Fit the model to resample k.
3. Calculate the apparent bootstrapped performance estimate S_{k_{b}}.
4. Calculate an additional performance estimate S_{k_{b}:orig} by evaluating the
bootstrapped model on the original data.
5. For j = 1, ..., J
1. Generate a bootstrapped dataset (with replacement) from resample k.
2. Fit the model to resample j.
3. Calculate the apparent double bootstrapped performance estimate S_{jk_{bb}}.
4. Calculate an additional performance estimate S_{jk_{bb}:k_b} by evaluating the double bootstrapped model on resample k.
3. Calculate the optimism of the bootstrapped performance estimate.
• O_{b} = \frac{1}{JK} \sum_{1}^{K} \sum_{1}^{J} (S_{jk_{bb}} - S_{jk_{bb}:k_{b}})
4. Calculate the optimism of the original performance estimate.
• O_{orig} = \frac{1}{K} \sum_{1}^{K} (S_{k_b} - S_{k_{b}:orig})
5. Calculate the bias-corrected optimism.
• Only worth it if the bias of the optimism is large compared to the standard error of the optimism?
• O_{adj} = O_{orig} - (O_{orig} - O_{b})
6. Calculate the optimism adjusted performance estimate using the bias-corrected optimism.
7. Report the model’s validated performance estimate using S_{adj}.

This seems more of a two-step procedure than “double” bootstrap, but feels appropriate for model selection and summarization.

1. For each model i = 1, \dots, I
1. Fit model i to the original data.
2. Calculate the apparent performance estimate S_{i_{app}}.
2. Determine which model has best performance.
1. For resamples j = 1, ..., J
1. Generate a bootstrapped dataset (with replacement) from the original data.
2. For each model i = 1, \dots, I
1. Fit model i to resample j.
2. Calculate the performance estimate S_{ij_{boot}}.
3. Calculate an additional performance estimate S_{ij_{boot:orig}} by evaluating bootstrapped model i on the original data.
2. Calculate the optimism of the apparent performance estimate. For each model i,
• O_i = \frac{1}{J} \sum_{1}^{J} (S_{ij_{boot}} - S_{ij_{boot:orig}})
3. Calculate the optimism adjusted performance estimate. For each model i,
• S_{i_{adj}} = S_{i_{app}} - O_i
4. Determine which model has best prediction performance and proceed using selected model method i.
3. Estimate the prediction performance of the selected model.
1. For resamples k = 1, ..., K
1. Generate a bootstrapped dataset (with replacement) from the original data.
2. Fit the selected model to resample k.
3. Calculate the performance estimate S_{k_{boot}}.
4. Calculate an additional performance estimate S_{k_{boot:orig}} by evaluating the bootstrapped model on the original data.
2. Calculate the optimism of the apparent performance estimate.
• O_{top} = \frac{1}{K} \sum_{1}^{K} (S_{k_{boot}} - S_{k_{boot:orig}})
3. Calculate the optimism adjusted performance estimate.
• S_{{top}_{adj}} = S_{{top}_{app}} - O_{top}
4. Report the model’s validated performance estimate using S_{{top}_{adj}}.

Am I on the right track or are there any fundamental errors here?

Thanks!

1 Like

The double bootstrap was created to fix problems with the regular bootstrap, especially in constructing confidence intervals that have the right tail areas on both sides. I don’t think the regular Efron-Gong optimism bootstrap is broken enough in the settings you have described to warrant going to the trouble of a double bootstrap.

Thanks, just to confirm, competing models can be fit within each bootstrap iteration and their resulting optimism adjusted performance estimates can be used to both select the top performing model and report final performance?

Now I see the question. To oversimplify I’d say that if you had two competing modeling procedures you could select the better one with a single bootstrap, but if you were selecting from among many then the double bootstrap is called for.

I’m not familiar with a practical implementation for this case and don’t know of any references. Is it a two-step procedure, like the second algorithm I outlined above, where one bootstrap procedure is used to compare competing models, then a second bootstrap procedure is used to estimate the final prediction performance? Or, would it be like the first algorithm above, which requires an outer loop of bootstrap resampling from the original data and an inner loop of bootstrap resampling from the resampled data?

The second. I’m sure someone has written this up somewhere. Nested cross-validation is even more common.