Confidence intervals for bootstrap-validated bias-corrected performance estimates


using the -rms- package to build and internally validate (by bootstrapping) a model, is there a way to get the confidence interval around those bias corrected performance estimates?


Hello Pavel, I had the same question quite a while back.

I’ve edited an earlier version as using dplyr is easier

The code requires capturing the output of each bootstrap from rms’s validate function:

v <- validate(f, B=300) #from rms package
t <- capture.output(validate(f, B=300, pr=TRUE))
validate_CI <- data.frame(output = t) >
separate(output, into = c(“metric”,“train”,“test”), sep = “[[:blank:]]+”) >
mutate(test = as.numeric(test)) >
filter(! >
select(metric, test) >
mutate(test = as.numeric(test)) >
group_by(metric) >
summarise(mn = mean(test, na.rm = TRUE),
lci = quantile(test, c(0.025), na.rm = TRUE),
uci = quantile(test, c(0.975), na.rm = TRUE)


Thank you so much for your help with this.

I think this only gives 95% CI for the “training” data.

Looking through the capture object t, I noticed only training and test performance characteristics but I think what I really need in order to get frank’s intended optimism corrected measures is a calculation of training - optimism = optimism-corrected.

The overfitting-corrected index is in the index.corrected vector in the result from validate. And I’m not understanding John’s code because I expected to see, say, 500 repetitions to form an outer bootstrap loop to compute bootstrap nonparametric percentile confidence intervals. And I would rather use the outer bootstrap to get standard errors and assume normality (rightly or wrongly) to reduce to 100 outer resamples. (validate is doing the inner resamples).

Interestingly, the capture.output function doesnt seem to capture the optimism or index.corrected vectors

All of this can be done easily with basic R. My old-fashioned R style makes it hard for me to read tidyverse code.

I think perhaps because they are calculated values (the overall optimism being the overall training mean - test mean). The code I wrote just gives the CI for the test metrics. I think this gives the optmism corrected for Dxy (only - similar code would be written for the other metrics). Note - I’ve not tried to take on board Frank’s preference wrt standard errors.

v <- validate(f, B = n.boot)
t <- capture.output(validate(f, B = n.boot, pr = TRUE))

validate_CI_Dxy <- data.frame(output = t) >
separate(output, into = c(“metric”,“train”,“test”), sep = “[[:blank:]]+”) >
filter(metric == “Dxy”) >
mutate(test = as.numeric(test)) >
mutate(train = as.numeric(train)) >
mutate(optimism = train - test) >
mutate(optimism.corrected = v[“Dxy”,“index.orig”] - optimism) >
select(metric, optimism.corrected) >
group_by(metric) >
summarise( mn_optimism.corrected = mean(optimism.corrected, na.rm = TRUE),
lci = quantile(optimism.corrected, c(0.025), na.rm = TRUE),
uci = quantile(optimism.corrected, c(0.975), na.rm = TRUE),

No, just access index.corrected directly.

I think this works:

95% CI for bias-corrected performance stats



for (i in 1:1000){

data_boot=data[sample(1:nrow(data), 15000 , replace=TRUE),]

model.boot <- lrm(outcome ~ predictor.vector
, data=data_boot, y=TRUE, x=TRUE, linear.predictors=T)

val.boot<-validate(model.boot, B=1000)

aucs.boot <- rbind(aucs.boot, res.boot[[1]][“AUC”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
slopes.boot <- rbind(slopes.boot, res.boot[[1]][“Slope”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
ints.boot <- rbind(ints.boot, res.boot[[1]][“Intercept”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])

dxy.boot <- rbind(dxy.boot, res.boot[[1]][“Dxy”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
R2.boot <- rbind(R2.boot, res.boot[[1]][“R2”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
B.boot <- rbind(B.boot, res.boot[[1]][“B”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])


quantile(aucs.boot[,“index.corrected”], c(0.025))
quantile(aucs.boot[,“index.corrected”], c(0.975))

Try this out - simple case of just getting a rough confidence interval for Dxy.

B <- 250; reps <- 500; dxy <- numeric(reps)  
n <- nrow(d)
f <- lrm(y ~ x1 + x2 + x3 + ..., data=d, x=TRUE, y=TRUE)
f    # show original model fit
validate(f, B=B)    # show overall validation
for(i in 1 : reps) {
   g <- update(f, subset=sample(1 : n, n, replace=TRUE))
   v <- validate(g, B=B)
   dxy[i] <- v['Dxy', 'index.corrected']
quantile(dxy, c(.025, .975))
1 Like

That works nicely too (and is more elegant).

To clarify for anyone who reads this thread, the “CalculateAucFromDxy()” function I used above is something I think I took from @Ewout_Steyerberg’s book.

CalculateAucFromDxy <- function(validate) {
  ## Test if the object is correct
  stopifnot(class(validate) == "validate")
  ## Calculate AUCs from Dxy's
  aucs <- (validate["Dxy", c("index.orig","training","test","optimism","index.corrected")])/2 + 0.5
  ## Get n
  n <- validate["Dxy", c("n")]
  ## Combine as result
  res <- rbind(validate, AUC = c(aucs, n))
  ## Fix optimism
  res["AUC","optimism"] <- res["AUC","optimism"] - 0.5
  ## Return results