Penalized regression with imputation in `rms`

A clinician I am working with has indicated interest in developing a clincial prediction model. Their data contain missing information for some of the covariates as well as the outcome. I know I can use multiple imputation via rms::aregImpute to address the issue of missing data.

One of my main concerns is the calibration of the model. This paper by Van Calster et.al reccomends using penalized regression in order to prevent overfitting. I know rms has an implementation for penalization via the pentrace function, but pentrace does not seem to accept a model fit via fit.mult.impute, returning the following error:

Error in pentrace(f2) : fitter not valid

What are my options here to introduce some moderate penalization into the model while accommodating imputation? Is this possible via rms or is this a scenario where I would “roll my own” estimator?

Your input is appreciated and I’m willing to share relevant details where needed.

We need to move to full Bayesian modeling to be able to handle more complexities at one time. The problem you are running into is that penalized maximum likelihood estimation needs a likelihood, but multiple imputation with Rubin’s rule for getting the final covariance matrix only leads to Wald statistics, i.e., relies on asymptotic normal distributions. You can try to do penalized estimation for each imputed dataset, hoping to show that the same penalty works for all imputed datasets, then hope Rubin’s rule works for the average of all the \hat{\beta}.

Hi Prof. Harrell,

As a follow-up question to your comments, if I fit penalized regression on each of the imputed datasets with different penalties without going for a full Bayesian model, can Rubin’s rule be applied when I combine the estimated coefficients because the form of the object functions (i.e. penalized likelihood function) are different.

In addition, I would like to solicit your opinion about fixing the penalty for all imputed data (ie. the same penalty across all imputed data) and apply Rubin’s rule to combine the estimated coefficient. For example, by fitting penalized regression to each imputed dataset, the "optimized’ penalties across different imputed dataset are often similar. In this case, I could fix the penalty parameter in the regression for all imputed datasets and combine the estimated coefficients using Rubin’s rule.

I don’t know of any research related to those good questions.