I am developing and internally validating a clinical prediction model. I want to combine LASSO and bootstrapping to get coefficients adjusted for both overfitting and optimism. However, I can’t find references doing both processes in the same workflow.
During development, I used LASSO to select variables and adjust coefficients for overfitting, yielding a set of \beta_{LASSO} . After shrinkage, I recalibrated the intercept (\alpha_{LASSO}).
After development, during internal validation, I used bootstrap to estimate optimism and the optimism corrected calibration slope (Universal Shrinkage Factor, S). I then multiplied \beta_{LASSO} by S to find coefficients adjusted for optimism and overfitting (\beta_{LASSO + Optimism}). Finally, I recalibrated \alpha_{LASSO} to get a final recalibrated intercept (\alpha_{LASSO + Optimism}).
My final model formula could be represented by:
Y = \alpha_{LASSO + Optimism} + X*\beta_{LASSO + Optimism}
Is this strategy statistically sound? Am I missing any detail?
Our group generally these days uses the steps outlined here. Manuscript with practical application forthcoming and will post the link here once published to see where we align or diverge.
Your strategy appears generally consistent with these recommendations, particularly regarding penalization and internal validation. The main area of potential debate as far as I can see here is whether applying a bootstrap “universal” shrinkage factor (S) on top of LASSO-shrunk coefficients constitutes “double shrinkage.” You want to use bootstrap to get optimism-corrected performance metrics, but not additional shrinkage on already-penalized coefficients which can move you away from the bias-variance trade-off sweet spot.
I would avoid it - particularly in small datasets and with using such a post-hoc uniform shrinkage factor. While the LASSO shrinks coefficients differentially, the post-hoc uniform shrinkage factor (if I understood correctly your approach) would treat all coefficients the same way.
You definitely should perform optimism evaluation. That is consistent with the steps linked above. The issue is post-hoc correcting already-penalized coefficients by multiplying by a further shrinkage factor.
Consistent with Pavlos’ reply, you would only use double shrinkage if you were forced to use lasso and hated lasso. You use lasso because it builds in shrinkage. If you don’t like the shrinkage it does, don’t use lasso.
Expert knowledge should dominate model specification. You used this approach but maybe didn’t take it far enough. Instead of feature selection I would use more unsupervised learning, e.g., sparse PCA as exemplified in two chapters in RMS.
Specifying Bayesian priors on all effects (ridge regression is an empirical version of this) is perhaps a more cogent approach.
Don’t trust lasso for feature selection, as discussed in links from here. It’s just a prediction tool (which I think is the way you are using it).
After reading this paper, I am questioning if one should check for optimism with bootstrapping after LASSO.
They say, “ Note that the magnitude of optimism in apparent model performance can only be checked for unpenalized logistic regression, as the LASSO and uniform shrinkage approaches already adjust for optimism.”
Any method can be put inside a bootstrap loop if it is fully specified and starts from scratch at each iteration, including solving for optimum penalties. If lasso is done well it should result in excellent calibration. But the catch is we don’t know the sample size needed to allow us to reliably choose the lasso penalty parameter.
It may be useful to contrast that with a Bayesian approach where what we know about the likely effects of each predictor is encoded in priors, the model is fitted, and we go home. This encoding may be done in a way that allows for nonlinearity, unlike lasso. For example you can spline every continuous variable and put a skeptical Gaussian prior on the inter-quartile range effect for each predictor.
Overfitting = over-optimism (Efron’s term) = optimism. lasso is designed to result in zero overfitting. Its success in achieving that goal depends on the accuracy of the choice of the penalty parameter \lambda, and frequently the data are not informative enough to correctly choose \lambda. That is one reason Bayes demands that you think more about each predictor potential effect, not using current data, before fitting a Bayesian model.
If lasso doesn’t correct for the optimism as it should, could bootstrapping be used as a complement to partially correct for the residual optimism?
About Bayes, I can definitely see your points and totally agree. I will explore it instead of my current complex lasso + random intercept + multinomial frequentist framework. My main concern is computational burden, especially combining bayes with bootstrapping.
Even when methods should theoretically prevent overfitting it’s a good idea to use the Efron-Gong optimism bootstrap to check this, to hedge our bets. On the other hand if you are using a shrinkage method that doesn’t solve the problem it was intended for perhaps a different method should be chosen. Note that the bootstrap doesn’t correct a model; it just tells you how bad the model is.