I am externally validating a competing risks prognostic model (all cause and breast cancer specific death at 5 and 10 years). I have made a smooth calibration curve following guidelines in this article (though I adapted the modelling approach slightly):
Briefly, I used a cause-specific discrete time generalized additive model (GAM) with the sole predictor being the clog-log transformed risk score, modeled using a smoothing spline. The resulting model based “observed” predictions are then plotted vs. predictions from the model being validated. The GAM includes a time interaction term to relax the PH assumption, though this tensor product term is specified with higher shrinkage.
From the fitted calibration curve, we can compute error metrics such as mean prediction error, integrated calibration index etc. However I am unsure about computing confidence intervals for these metrics. My hunch is that we need to bootstrap the whole process i.e. re-fit the model to create the smooth calibration curve (not the model we are validating) and subsequently re-compute each calibration error index of interest each time, before computing quantile based confidence intervals. This appears to take into account uncertainty in fitting the calibration curve itself. However, re-fitting the calibration curve in such a bootstrap procedure might not be possible due to computing time, so I wonder if we might be OK using a normal approximation. This would just fit the calibration curve once and then the standard error (from which we can use the normal approximation to get the confidence intervals) would be obtained from the single variable which represents the absolute difference between model based (observed) risk score and predicted (from the model we are validating) risk score. I would greatly appreciate some advice. Thank you.