Seeking guidance on estimating variance of a given model

I would appreciate if you can gudie me on this problem:
I want to estimate variance of a model given dataset.
I propose the following models:
A) Reserve a small subset (TST) of the dataset for variance estimation. With the rest, draw 1000s of bootstrap samples. For each sample do the following:
-A.1- train the model
-A.2- classify each observation in TST
end
Find the ratio of correct labels/number of bootstrap samples
B) Initially I had considered in A.1 to drop models that had large training error. I did not find
any reason why I would include them in estimating variance, as they will never be viable for any classification exercise.
I would appreciate if you can help me understand

  1. whether these are valid variance estimation procedures for classification
  2. Are there benchmark procedures published ( I spent many days searching and I lost my way)
  3. I have not found any method to estimate bias? Do you have any recommendation.
    Bias is indicated, that is i can detect the presence of bias but unable to determine its magnitude.

Thank you
Raman

This is a question for stats.stackexchange.com . Please post there. And note that classification is probably not a good goal here (as opposed to prediction). See fharrell.com/post/classification for reasons why.

Look in ‘The Elements of Statistical Learning’ by Hastie, Tibshirani and Friedman. There’s a section on the right and wrong way to do cross validation. Cross validation is better than a single test set (especially if the model method doesn’t have hyper-parameters). The problem you propose sounds like a variation of what they have simulations on. I suspect model selection via a single test set leads to biased models.

1 Like