RMS Describing, Resampling, Validating, and Simplifying the Model

pravleen · October 6, 2022, 11:06pm

I’m a little stuck on how to average the calibration curves. Is there a way to do it in the rms package?

f2harrell · October 7, 2022, 2:02pm

You would have to program that averaging.

KirovDoc · October 11, 2022, 6:30pm

Hello! I have a question about rms package and lrm model. When I run validate (MyModel, bw=TRUE, B=400) I get the report “Frequencies of Numbers of Factors Retained”. How can I sum what exactly factors have been retained during bootstrap procedure? The factors are marked with asterisks, but I cannot count them.

f2harrell · October 11, 2022, 9:53pm

v <- validate(..., bw=TRUE)
attr(v, 'kept')

Program the needed processing of kept.

KirovDoc · October 17, 2022, 7:10am

Greetings! Do parameters of the calibration plot (Slope, Dxy and others) belong to a nonparametric calibration curve or to the logistic calibration curve?

f2harrell · October 17, 2022, 7:17pm

Somers’ D_{xy} is independent of calibration. Calibration slope and intercept come from assume a linear (in the logits) calibration curve and fitting it with logistic regression. Indexes starting with E come from the nonparametric curve. There are also integrated calibration measures from the nonparametric curve, in the literature.

EpiLearneR · November 20, 2022, 2:10am

Nomogram with one predictor contributing very minimal
We developed a clinical prediction model with prespecified predictors and a full model and with enough effective sample size, with an optimism corrected AUC of .95. In our nomogram, one predictor contributes only minimal points, 2 or 3 points only and the beta is only zero. What to do? We have prespecified the full model. penalization done. I am tempted to remove that variable.
i would like to have your opinion

f2harrell · November 20, 2022, 12:26pm

It is most accurate to keep the weak predictor in the model and in its displayed nomogram. This will not affect the predictions so much but will affect what the nomogram doesn’t show—the precision (e.g., standard error) of estimates. An alternative is model approximation, also known as pre-conditioning. Use the full model as the basis for inference and standard error calculations but display the reduced model. For an example see this.

Abraham · December 5, 2022, 6:50pm

I have a model for which I have validated using optimism-adjusted bootstrap. However, we are interested in the performance of this model on subsets of the data (e.g. individual treatment arms). The reason for this inquiry is because treatment arm is the model’s strongest predictor and we are interested to know how well the model will perform within a given arm. Is there a way to apply the rms::validate function on selected subsets of the sample used to train the model?

f2harrell · December 5, 2022, 9:59pm

Since we use resampling for the validation I’m not sure how to do subsetting with that.

Abraham · December 5, 2022, 11:35pm

Would I be able to just resample from the preferred subset?

f2harrell · December 5, 2022, 11:48pm

I spoke too soon. The work of resampling validation is done by the rms predab.resample function which accepts a subset= argument. Give that argument to validate or calibrate and it will be respected.

Abraham · December 8, 2022, 7:30pm

Thanks.
I just want to make sure I understand what’s happening. When I pass the subset argument to validate, e.g. validate(fit, method = “boot”, subset = data$ARM = “A”), am I fitting the model and performing all resampling necessary for bootstrap with ALL subjects however only assessing the predictions made on subjects in ARM A? Or is this somehow only sampling from the subjects in ARM A (in which case it shouldn’t work if ARM was in the model fit)?

f2harrell · December 8, 2022, 11:39pm

The fit is from everyone and the validation is only for the subset.

Natan_Nagar · December 25, 2022, 2:43pm

I have another question, this time on model validation. I have several different datasets with hundreds to tens of thousands of samples, on which I need to train and estimate the performance of three different models that use two predictors (at most) that are transformed by restricted cubic splines with 5 knots. For each dataset I want to fit and estimate a new model using all data. Is model validation needed in my case? I would appreciate any refernces on this topic, if availabe.

f2harrell · December 26, 2022, 3:41pm

Assuming the two predictors are pre-specified there is little need for model validation unless the effective sample size is small, e.g., your outcome variable Y is binary and min(# events, # non-events) is small. There is just a need for having a good model fit. Depending on what distributional assumptions you need for Y the main issue may be interactions among Xs.

Stamatis · January 17, 2023, 4:06pm

Having read the excellent conversation in this thread, I have some questions myself. I’m developing a prediction model (binary outcome) in a dataset that has some missing values, and I stumbled upon the issue of multiple imputation-internal validation combination. I read the relevant material provided here, the options of MI-Val vs Val-MI and so far what I have done is this:

Multiple imputation using mice
Then in order to get performance measures, I fit my model in each of the imputed datasets using lrm function, and then got the average of each performance measure (average apparent, optimism, and index-corrected) using the validate function from the rms package. So I chose to impute first, then bootstrap.
I found a nice way to get CIs of the performance measures here: Confidence intervals for bootstrap-validated bias-corrected performance estimates - #9 by Pavel_Roshanov from F.Harrell, but in my case I can do this procedure in each of the imputed datasets, and now I don’t know how to combine those confidence intervals to get my overall CIs across the imputed datasets.
Having done that, is it now ok to use fit.mult.impute, to get the pooled model coefficients with standard errors? I would also like to do some post-estimation shrinkage using the index.corrected calibration slope and present those final shrunk coefficients.
I used predict in each of the imputed datasets, to get my model’s predictions and then calculated the average predictions across the datasets. With these average predictions, I made a calibration curve with the val.prob.ci.2 function from the CalibrationCurves package.
To summarize, is it correct to use fit.mult.impute to get my final model, but manually fit my model with lrm across each imputed dataset to do my internal validation? How do I combine CIs across datasets? Is it correct to plot calibration curves with the average predictions across the imputed datasets? Sorry for the lengthy post. Thanks in advance!

wbaker0621 · February 2, 2023, 12:32pm

Hi everyone, I am using the RMS package to develope and internally validate a risk prediction model. After using the ‘validate’ function to run 500 boostrap with resampling, we used the calibraton slope (0.976) to correct our regression co-efficients, then re-estimated our intercepts (the outcome is ordinal). When using the ‘predict’ function, the linear predictors for each ordinal outcome are the exact same for the initial model and the model with the optimism-corrected coefficients and adjusted intercepts. Any insight on why this may be? We aren’t shrinking the coefficients by much, but I don’t understand why the lp is exactly the same. Thank you in advance.

f2harrell · February 2, 2023, 9:05pm

Hard to know without seeing the relevant code.

Elias_Eythorsson · February 4, 2023, 3:26pm

This should help Multiple imputation and bootstrapping for internal validation