How do we use the calibration optimism in context of the bias correction method?

Good afternoon!

I have couple of questions regarding how to use the calibration optimism method in context of the bias correction method for model validation?

Just to provide some context to my question -
Calibration optimism calculation is similar to the bias correction method using bootstrap but applied on calibration curve instead, this method is mentioned at [calibration bias](splines - How to estimate a calibration curve with bootstrap (R) - Cross Validated].

Do we need calibration bias correction, if we are already doing the bias correction using bootstrap optimism? If yes, what calibration function should we use if we use this model in clinical practice on new data - the corrected calibration function or the non-corrected calibration function?

In theory, should the calibrated probabilities corrected by calibration optimism give the same performance statistics as the optimism bootstrap estimates?

What you you meaning by bootstrap optimism?

I referred to the optimism or the bias correction calculated using bootstrapping method for the model evaluation metrics as “bootstrap optimism” and the optimism or bias correction calculated for the calibration curve as “calibration optimism”, just so we are able to distinguish between the two applications of the bias correction method.

If I interpret you correctly you are mixing the ideas of miscalibration from overfitting vs. biased estimation of performance measures (such as predictive discrimination) due to overfitting. Don’t mix the two. For now think just about calibration accuracy.

Thank you Frank for your quick response. Does that mean these two calculations (i.e. correcting the miscalibration vs correcting the bias estimation) are independent of each other even though they use the same bootstrap validation method and same bootstrap samples?

1 Like

That are not statistically independent but they are independent in the sense you are discussing.


The big issue here is why. Calibration is rarely a problem on internal validation, it is something that reflects real-life differences between cohorts (e.g. different types of work up, different definitions of an outcome or predictor variable, different treatment). For instance, we once found that a model built on an academic cohort was miscalibrated when applied to a community cohort, because the surgeons in the community were less experienced and had poorer results. That is why you worry about calibration. Internal correction for calibration on internal validation will do not very much at all.

1 Like

I think of strong, rigorous, internal validation as a prerequisite, and a method that allows one to often avoid wasting time and effort on external validation when you can’t even reliably predict within the original stream of patients.