Combining splines with LASSO in clinical prediction models

arthur_albuquerque · January 10, 2026, 11:12am

I want to include restricted cubic splines while developing a clinical prediction model with a multinomial outcome. However, I also want to apply LASSO for shrinkage.

I know that one could fit a grouped-LASSO to ensure that pairs of linear terms parameterizing each spline are either included or excluded together. Unfortunately there is no R package that applies this approach for multinomial models.

Should I include restricted cubic spline basis functions in the model matrix before fitting LASSO?

I understand that a major drawback would be that standard LASSO penalizes basis functions independently. Thus it might select the 3rd/2nd basis function of a spline while dropping the 1st.

f2harrell · January 10, 2026, 12:59pm

I don’t think it will be valid to use any technique that doesn’t always consider all basis functions simultaneously.

Note that lasso has a very low probability of selecting the right features as shown in a link inside Challenges of High-Dimensional Data Analysis. If you are just using lasso for shrinkage I would definitely use ridge regression instead. Are you doing feature selection? Why? Why not use unsupervised learning (data reduction) which is more stable and doesn’t try to separate hard-to-separate predictors?

arthur_albuquerque · January 10, 2026, 1:18pm

I was indeed planning to use LASSO for both variable selection (combined with expert input before) and shrinkage, as mentioned in a previous post: Optimism Correction after LASSO in clinical prediction models - #12 by arthur_albuquerque

In that post you also mentioned using unsupervised learning, I have to delve more into this subject. In general I only see backward stepwise or LASSO for variable selection in high-quality articles about clinical prediction models. Not sure I have seen an applied example of unsupervised learning in this context.

f2harrell · January 10, 2026, 4:40pm

Unsupervised learning (e.g., variable clustering or nonlinear principal components) has been used larger in high-dimensional situations and not so much with regular clinical variables. But it works well in your setting, typically better than variable selection. It is more interpretable in the sense of being more stable and not requiring arbitrary decisions about which collinear variables to exclude. Unsupervised learning doesn’t usually have to be accounted for in resampling validation, unless the methods you propose to use.

stephenrho · January 12, 2026, 4:17pm

If you want splines and shrinkage/penalization (and are using R) you could consider mgcv::multinom R: GAM multinomial logistic regression

f2harrell · January 12, 2026, 5:43pm

The mgcv package is excellent. I’d be interested to learn the form of penalization it uses for multinomial logistic models. What we need is a penalty that pushes towards similar shapes of effects across outcome categories.

arthur_albuquerque · January 12, 2026, 5:45pm

Yes, in lasso through glmnet we can use the argument type.multinomial = "grouped". It would be interesting if mgcv provides a similar approach.

stephenrho · January 12, 2026, 6:37pm

It looks to be possible to use the same smooth across multiple outcome categories (see last example)

arthur_albuquerque · January 12, 2026, 7:52pm

Yes, thanks.

Now I wonder if it does matter to use grouped instead of the ungrouped approach in multinomial models. In LASSO, “grouped” ensures covariates are all in our out together. But is this important at all?

The end user of a CPM doesn’t really care what coefficients are present in each multinomial sub-model.

f2harrell · January 13, 2026, 12:00pm

A semi-related side note: likelihood ratio and Wald tests for association need to have all coefficients present that were ever given a chance to be present. Chunk tests need well-defined pre-specified degrees of freedom.