Optimism-correction for Lasso-Cox



Dear All,

I am trying to predict resistance to treatment with a Lasso-Cox model trained on 1435 patients follow for 10 years, 181 events and 33 variables (both continuous and categorical for a total of approximately 50 covariates in the model if we count the dummies).

I have a question about bootstrap optimism-correction for calibration with 1000 bootstrap samples for the developed Lasso-Cox model.

My Lasso Cox has an apparent C-index of 0.66 and an apparent calibration slope of 1.73.

The optimism-corrected estimates are:

C-index_corr = 0.66 - 0.15 = 0.51

Calibration slope_corr = 1.73 - 1.77 = -0.04

I understand that the estimate of the calibration slope has such a large optimism because Lasso-Cox is a biased estimator. So, how would you interpret this result? Is it sensible to assess calibration in this case? Would you re-calibrate the model?

Furthermore, a larger optimism-correction was observed in a random survival forest for both performance measure using the same data. This led me to this additional question: is optimism-correction valid for machine learning methods?

May it be that I am fitting machine learning methods to small sample size relative to the number of features, which produce unstable models (see for example van der Ploeg, T., Austin, P.C. & Steyerberg, E.W. BMC Med Res Methodol (2014) 14: 137).

Deborah Agbedjro



Lasso and elastic net are penalized maximum likelihood methods and regression models, not machine learning. More about that picky point here. This relates directly to the Ploeg et al paper’s relevance here. That paper deals with unpenalized regression and non-regression machine learning algorithms.

A slightly extreme but correct position is that penalized regression is used so that you don’t have to worry about overfitting. If you penalize correctly, the model is underfitted by the amount you expect it to be overfitted. So the apparent calibration slope that is > 1 is not of much interest. The cross-validated slope (say 100 repeats of 10-fold cross-validation with the penalty parameter recomputed and coefficients estimated 1000 times) is what is important, and you expect that to come out to nearly 1.0. That being said, I have less experience with extreme cases where the predictive information in the data may be close to zero. But before going further I recommend checking that the bootstrap worked in your case, by running the above mentioned cross-validation procedure. See also this.

In parallel with all this I recommend testing whether there is a predictive signal present by computing the first 5 principal components of the predictors, putting them in an unpenalized model, and getting a chunk test with 5 d.f.

Another issue is that I expect lasso to be unreliable in terms of selecting the “right” predictors, and a more fruitful approach will be data reduction followed by either penalized or unpenalized Cox regression (quadratic penalty/ridge regression). This will be more stable and interpretable and will operate within the confines of your data richness. With 181 events you might reduce the problem using data reduction (unsupervised learning) down to 10 dimensions/summary scores.


Thank you very much Frank,

Your reply is very helpful.

I have done optimism correction also through repeated cross-validation as you suggested, however, the correction was much smaller. In the extreme case you correct through bootstrap and through repeated cross-validation and you obtain two corrected estimates for the C-index of 0.65 and 0.75 respectively, which estimate would you trust?
What do you think about nested cross-validation as an internal validation method? Would this method suit penalized regression and random forests best?

Furthermore, I realized that when the sample size is smaller, optimism-correction through bootstrap estimates a smaller optimism than repeated cross-validation, as I would expect. Would it be correct to prefer using bootstrap optimism correction when your sample size is small and to say that estimates of optimism should be equivalent if the sample size is large?

Finally, do you have any reference for the method you proposed which combines principal components analysis and ridge regression?

Thank you a lot for your patience and advise!
Kind regards,


Hi Deborah - I have a double teaching load this month and unfortunately don’t have time to delve into those excellent questions. I would just add briefly that I’d trust 100 repeats of 10-fold CV the most, make sure that all supervised learning steps are repeated for each of the 1000 model fits. But the bootstrap is better for exposing the arbitrariness of any feature selection that is also being used.