Confidence intervals for bootstrap-validated bias-corrected performance estimates

Pavel_Roshanov · August 15, 2019, 5:22am

Hello,

using the -rms- package to build and internally validate (by bootstrapping) a model, is there a way to get the confidence interval around those bias corrected performance estimates?

Thanks!

kiwiskiNZ · August 15, 2019, 8:11pm

Hello Pavel, I had the same question quite a while back.

I’ve edited an earlier version as using dplyr is easier

The code requires capturing the output of each bootstrap from rms’s validate function:

v ← validate(f, B=300) #from rms package
t ← capture.output(validate(f, B=300, pr=TRUE))
validate_CI ← data.frame(output = t) >
separate(output, into = c(“metric”,“train”,“test”), sep = “[[:blank:]]+”) >
mutate(test = as.numeric(test)) >
filter(!is.na(test)) >
select(metric, test) >
mutate(test = as.numeric(test)) >
group_by(metric) >
summarise(mn = mean(test, na.rm = TRUE),
lci = quantile(test, c(0.025), na.rm = TRUE),
uci = quantile(test, c(0.975), na.rm = TRUE)
)

Pavel_Roshanov · August 16, 2019, 5:48am

Thank you so much for your help with this.

I think this only gives 95% CI for the “training” data.

Looking through the capture object t, I noticed only training and test performance characteristics but I think what I really need in order to get frank’s intended optimism corrected measures is a calculation of training - optimism = optimism-corrected.

f2harrell · August 16, 2019, 11:17am

The overfitting-corrected index is in the index.corrected vector in the result from validate. And I’m not understanding John’s code because I expected to see, say, 500 repetitions to form an outer bootstrap loop to compute bootstrap nonparametric percentile confidence intervals. And I would rather use the outer bootstrap to get standard errors and assume normality (rightly or wrongly) to reduce to 100 outer resamples. (validate is doing the inner resamples).

Pavel_Roshanov · August 16, 2019, 6:53pm

Interestingly, the capture.output function doesnt seem to capture the optimism or index.corrected vectors

f2harrell · August 16, 2019, 7:45pm

All of this can be done easily with basic R. My old-fashioned R style makes it hard for me to read tidyverse code.

kiwiskiNZ · August 19, 2019, 1:06am

I think perhaps because they are calculated values (the overall optimism being the overall training mean - test mean). The code I wrote just gives the CI for the test metrics. I think this gives the optmism corrected for Dxy (only - similar code would be written for the other metrics). Note - I’ve not tried to take on board Frank’s preference wrt standard errors.

v ← validate(f, B = n.boot)
t ← capture.output(validate(f, B = n.boot, pr = TRUE))

validate_CI_Dxy ← data.frame(output = t) >
separate(output, into = c(“metric”,“train”,“test”), sep = “[[:blank:]]+”) >
filter(metric == “Dxy”) >
mutate(test = as.numeric(test)) >
mutate(train = as.numeric(train)) >
mutate(optimism = train - test) >
mutate(optimism.corrected = v[“Dxy”,“index.orig”] - optimism) >
select(metric, optimism.corrected) >
group_by(metric) >
summarise( mn_optimism.corrected = mean(optimism.corrected, na.rm = TRUE),
lci = quantile(optimism.corrected, c(0.025), na.rm = TRUE),
uci = quantile(optimism.corrected, c(0.975), na.rm = TRUE),
)

f2harrell · August 19, 2019, 11:11am

No, just access index.corrected directly.

Pavel_Roshanov · August 20, 2019, 1:11am

I think this works:

95% CI for bias-corrected performance stats

aucs.boot<-NULL
slopes.boot<-NULL
ints.boot<-NULL

dxy.boot<-NULL
R2.boot<-NULL
B.boot<-NULL

for (i in 1:1000){

data_boot=data[sample(1:nrow(data), 15000 , replace=TRUE),]

model.boot <- lrm(outcome ~ predictor.vector
, data=data_boot, y=TRUE, x=TRUE, linear.predictors=T)

val.boot<-validate(model.boot, B=1000)
res.boot<-list(CalculateAucFromDxy(val.boot))

aucs.boot <- rbind(aucs.boot, res.boot[[1]][“AUC”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
slopes.boot <- rbind(slopes.boot, res.boot[[1]][“Slope”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
ints.boot <- rbind(ints.boot, res.boot[[1]][“Intercept”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])

dxy.boot <- rbind(dxy.boot, res.boot[[1]][“Dxy”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
R2.boot <- rbind(R2.boot, res.boot[[1]][“R2”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])
B.boot <- rbind(B.boot, res.boot[[1]][“B”, c(“index.orig”,“training”,“test”,“optimism”,“index.corrected”,“n”)])

}

quantile(aucs.boot[,“index.corrected”], c(0.025))
quantile(aucs.boot[,“index.corrected”], c(0.975))

f2harrell · August 20, 2019, 11:32am

Try this out - simple case of just getting a rough confidence interval for Dxy.

require(rms)
B <- 250; reps <- 500; dxy <- numeric(reps)  
n <- nrow(d)
f <- lrm(y ~ x1 + x2 + x3 + ..., data=d, x=TRUE, y=TRUE)
f    # show original model fit
validate(f, B=B)    # show overall validation
for(i in 1 : reps) {
   g <- update(f, subset=sample(1 : n, n, replace=TRUE))
   v <- validate(g, B=B)
   dxy[i] <- v['Dxy', 'index.corrected']
   }
quantile(dxy, c(.025, .975))

Pavel_Roshanov · August 20, 2019, 10:49pm

That works nicely too (and is more elegant).

To clarify for anyone who reads this thread, the “CalculateAucFromDxy()” function I used above is something I think I took from @Ewout_Steyerberg’s book.

CalculateAucFromDxy <- function(validate) {
  ## Test if the object is correct
  stopifnot(class(validate) == "validate")
  
  ## Calculate AUCs from Dxy's
  aucs <- (validate["Dxy", c("index.orig","training","test","optimism","index.corrected")])/2 + 0.5
  
  ## Get n
  n <- validate["Dxy", c("n")]
  
  ## Combine as result
  res <- rbind(validate, AUC = c(aucs, n))
  
  ## Fix optimism
  res["AUC","optimism"] <- res["AUC","optimism"] - 0.5
  
  ## Return results
  res
}

jessexknight · February 6, 2020, 5:05am

Not sure why validate doesn’t give confidence intervals by default …

f2harrell · February 6, 2020, 1:09pm

Reasons: computation time, the method hasn’t been nailed down, and programming time. Code contributions are always welcomed for the rms package using its github site, when the new code is fully documented in the appropriate help file.

Rajshree · July 8, 2021, 7:44pm

Hi,

Thanks everyone involved here for the great discussion. Recently, I have been working on getting confidence interval for bootstrap-validated bias corrected performance estimates.
Because of high computational requirement of our model training, we could only calculate the confidence interval based on the outer loop validate [I am referring to Frank’s code for CI calculation with the inner and outer loops].

Would that be any close to CI calculated with the inner loops?

limitededition · November 1, 2022, 10:20am

I was inspired by Harrell’s recent post on bad statistical practice, particularly on using the bootstrap to visualize uncertainty in ranks of “winning” predictors from variable selection. I am not too well versed in statistical jargon, so please let me know if anything is unclear.

In my case, I have a variable selection pipeline with two univariable screenings whereby a predictor X is discarded if it does not meet all pre-specified criteria (c1 and c2, both regressions with somewhat arbitrary cut-offs for selection), and bootstrap the whole selection pipeline B times.

One metric I am after is, for all X, in what proportion (of a total of B bootstraps) the predictor is included in the bootstrap samples. This seems rather straightforward. However, it would also be of interest to extract model estimates from both c1 and c2 to get e.g. percentile bootstrap confidence intervals for the ORs. Here, however, such CIs will be calculated from distributions of different sizes. If we consider criteria 2 (c2), and B=1000, I may find that X1 is included in 90% of the bootstrapped samples, and hence there will be 900 ORs. For X2, however, I find it to be included in 80% of the bootstrapped samples, and hence there will here only be 800 ORs to calculate 95% CIs. Should the missing values here be imputed to 1.0, discarded, or should the method be conducted differently?

This becomes further complex since Xn may fulfill c1 but not c2, which in effect means that in many cases there will be more bootstrapped ORs from c1 than c2.

Edit: For variable selection algorithms such as backward elimination, it appears that imputing coefficients to 0 (for linear models) for predictors in bootstrap samples not containing said predictors leads to lower bias than to omit them (DOI: 10.1002/sim.3104). Not sure if the same would apply to my strategy, which does not involve backward elimination but is rather two distinct N=N(X) univariable regressions.

f2harrell · November 1, 2022, 12:39pm

Yes the Peter Austin procedure should apply here. Once you have a rule for a final estimate you can insert 0 into the estimate when the criteria are not met.