Bootstrap vs. cross-validation for model performance

Yes bootstrap and the slower 100 repeats of 10-fold cross-validation are equally good, and the latter is better in the extreme (e.g., N < p) case. All analysis steps must be re-done for both the bootstrap and cross-validation (the latter needs up to 1000 analyses, the bootstrap usually 300-400).

You’ve described the bootstrap process correctly. It sounds strange, but the bootstrap provides an excellent estimate of how much overfitting you have, then you subtract that amount. It is based on this philosophy:

  • want to estimate the performance in an infinitely large independent sample
  • estimate how much overfitting you have in your sample and subtract it
  • bootstrap samples have duplicate observations, result in super-overfitting
  • bootstrap computes the difference between super-overfitting and regular overfitting, the latter by evaluating the bootstrap model on the original sample
  • the difference between super overfitting and regular overfitting is the same as the real difference we want to estimate: between regular overfitting and no overfitting
6 Likes