I am developing and internally validating a clinical prediction model. I want to combine LASSO and bootstrapping to get coefficients adjusted for both overfitting and optimism. However, I can’t find references doing both processes in the same workflow.
During development, I used LASSO to select variables and adjust coefficients for overfitting, yielding a set of \beta_{LASSO} . After shrinkage, I recalibrated the intercept (\alpha_{LASSO}).
After development, during internal validation, I used bootstrap to estimate optimism and the optimism corrected calibration slope (Universal Shrinkage Factor, S). I then multiplied \beta_{LASSO} by S to find coefficients adjusted for optimism and overfitting (\beta_{LASSO + Optimism}). Finally, I recalibrated \alpha_{LASSO} to get a final recalibrated intercept (\alpha_{LASSO + Optimism}).
My final model formula could be represented by:
Y = \alpha_{LASSO + Optimism} + X*\beta_{LASSO + Optimism}
Is this strategy statistically sound? Am I missing any detail?
Our group generally these days uses the steps outlined here. Manuscript with practical application forthcoming and will post the link here once published to see where we align or diverge.
Your strategy appears generally consistent with these recommendations, particularly regarding penalization and internal validation. The main area of potential debate as far as I can see here is whether applying a bootstrap “universal” shrinkage factor (S) on top of LASSO-shrunk coefficients constitutes “double shrinkage.” You want to use bootstrap to get optimism-corrected performance metrics, but not additional shrinkage on already-penalized coefficients which can move you away from the bias-variance trade-off sweet spot.
I would avoid it - particularly in small datasets and with using such a post-hoc uniform shrinkage factor. While the LASSO shrinks coefficients differentially, the post-hoc uniform shrinkage factor (if I understood correctly your approach) would treat all coefficients the same way.
You definitely should perform optimism evaluation. That is consistent with the steps linked above. The issue is post-hoc correcting already-penalized coefficients by multiplying by a further shrinkage factor.
Consistent with Pavlos’ reply, you would only use double shrinkage if you were forced to use lasso and hated lasso. You use lasso because it builds in shrinkage. If you don’t like the shrinkage it does, don’t use lasso.
Expert knowledge should dominate model specification. You used this approach but maybe didn’t take it far enough. Instead of feature selection I would use more unsupervised learning, e.g., sparse PCA as exemplified in two chapters in RMS.
Specifying Bayesian priors on all effects (ridge regression is an empirical version of this) is perhaps a more cogent approach.
Don’t trust lasso for feature selection, as discussed in links from here. It’s just a prediction tool (which I think is the way you are using it).