Should I bootstrap Spline term construction?

Dear Professor Harrell,

I wonder if we should bootstrap the construction of smoothing splines?

I am modelling a logistic model with smoothing term in brms (and mgcv). When performing Bootstrap, a normal replaced resampling of dataset by convention can lead to destructive error from smooth construction where an insufficient number of data points are provided.

I am therefore thinking of a Bayesian-style Bootstrap (mimicking the implementation of package bayesboot) where, rather than resample the dataset, I sample the weight for each observation, following a Dirichlet distribution and feed it to brms (or mgcv with quasibinomial family). However, looking at the code generated makes me wonder. As the construction of smoothing term effectively still used the full training feature space and not surrogate one, would that cause an underestimation of overoptimism? On the flip side, judging from mgcv code, I think (!!) the construction of s(x) does not depend on outcome y (the smooth coefficients do, the basic functions don’t). Therefore, maybe indeed it makes more sense to not bootstrapping it? But I’m not sure.

Please I wonder how such situation was implemented in rms? Did you came across the instability in smooth term construction?

Splines are a means to an end and the basic functions (spline terms in the regression) are quite arbitrary. Coefficients of basis functions can fly around even thought the predicted values don’t. Consider looking at a set of say 10 predicted values to see how they vary over samples.

Thank you. If I get it correctly, the constructed spline terms are not completely translatable to prediction per se. As they are just some “numbers”, I guess it is not harmful for splines (rcs or s) to be constructed before the bootstrap process, even better given the extra stability?

FWIW, here’s a Bayesian GAM R package that should give you credible intervals for the smooth splines, and also does variable selection:

1 Like

Thanks. Sorry I did not make it clearly. I was doing bootstrap for estimating bias in c-index and calibration. Rms has its own function but gam doesn’t. I did two strategies:

  1. Bootstrap → mgcv::gam on each bootstrap dataset. Both smooth construction and model fitter were not aware of left-out observations. c-idx bias was about 0.1
  2. Construct the s before bootstrap. This is a bit trickier, done by sampling the observational weights and feed it into mgcv::gam. Something like:
 i <- sample(nrow(real_df), size=nrow(real_df), replace = TRUE)
bootdf <- real_df[i,]
# Get the weights for each observation
weights <-  bootdf |> summarise(W = n(), .by=id) |> 
            tidyr::complete(real_df$id), fill=list(W=0))
real_df <- left_join(real_df, weights, by=id)
boot_fit <- mgcv::gam(y ~ s(x), weights=W, data=real_df) 
# this call remove observations with weights 0 in model fitter, but still include them in the smooth construction

This get me bias = 0.01.

Not clear if this is a problem for you but be sure to favor the Efron-Gong optimism bootstrap and not some out-of-bag variant.

Right you can construct the spline basis functions using unsupervised learning as a one-time thing up front.

Spline terms need to be treated as connected always.

Thank you for pointing it out. This made me realised that the my number of parameters p might be a bit large compared to effective sample size n that optimism bootstrap has started breaking down, leading to such unstable behaviour. If out-of-bag (I think you mean cross validation) is not preferred, please do you have any advice?

But this might be a good case study, the convenient implicit knots in mgcv has got me off guard.

Good observation. If “out of bag” applies to what you did, this is a common misunderstanding of the Efron-Gong optimism estimator which does not involve any out of bag operations. It compares model performance from the bootstrap sample to performance on the whole sample.

Ah no. I did not do anything strange. I honestly thought out-of-bag is related to cross validations (let’s pretend I did not come across the term out-of-bootstrap when researching this).

This was what I did, following your books that, for any metric, the expected bias between bootstrap dataset and whole sample/real dataset → expected bias between whole sample and the population. This method was quite good for most statistical models I have built. This is first time it failed silently.

1 Like

The main occasion when I’ve seen the optimism bootstrap fail is in the p >> N case where the bootstrap tells you you have really bad overfitting and repeated cross-validation properly tells you you have tragic overfitting.