RMS Describing, Resampling, Validating, and Simplifying the Model

You can use plot(nomogram()), print(nomogram()) (shows points tables), predict, and Predict. But nomograms are meant for one-observation-at-a-time calculations.

1 Like

Is it possible to upload reproducible code?

A nomogram is simply a visual tool you can use to make manual predictions based on your model. You can get predicted probabilities for n cases by using the “predict” function of your model. There is no need to extract probabilities from the nomogram.

library (rms)
model = lrm (Y ~ X1+Xn, data)
predict (model, data, type = “fitted”)

Doctor Harrell, please help. How to code a validate function with cross validation method in rms library?

validate(MyModel, method = “boot”, B=400) - it’s working
validate(MyModel, method = “crossvalidation”, B=100) - it is not working
I get the error. I want to get 100 repeats of 10-fold cross-validation

Use method='crossvalidation', B=10 but repeat the whole process 100 times and average. For code that does the averaging see this where you’ll see the val function:

val <- function(fit, method, B, r) {
  contains <- function(m) length(grep(m, method)) > 0
  meth <- if(contains('Boot')) 'boot' else
          if(contains('fold')) 'crossvalidation' else
          if(contains('632')) '.632'
  z <- 0
  for(i in 1:r) z <- z + validate(fit, method=meth, B=B)[
          c("Dxy","Intercept","Slope","D","U","Q"),'index.corrected']
  z/r
}

For your case the B argument to val would be 10 and r=100. You’ll have to modify the vector of names of accuracy measures to suit your needs.

2 Likes

Despite being published just over 20 years ago, I think this paper on Bayesian model averaging complements the discussion and recommendations in RMS. It explains the problems with prediction based on single models (and by implication step down methods which select the “best” model using the data) in the introduction.

Their discussion of frequentist solutions in the last section is an apt description of RMS and much has been done from this perspective since that paper was written.

It may have been mentioned in the many RMS references, but I was not able to find it despite looking this morning.

This discusses computational aspects that simplify the implementation of Bayesian Model Averaging.

1 Like

Excellent papers. In general I favor using a single “super model” that has parameters for the same things that model averaging allows for, with shrinkage priors on the complex effects (e.g., unequal variance, interactions, non-proportional hazards or odds).

1 Like

I have a general question about calibration that I can’t find a description of here nor in other relevant resources. What are the common sources of miscalibration in predictive modelling? Could you please refer me to the relevant papers?

Start with Ewout Steyerberg’s book Clinical Prediction Models. The number one culprit is overfitting, related to regression to the mean, which causes low predictions to be too low and high predictions to be too high.

I have a new question related to this - is there a way to combine calibration plots over multiple imputations, to get one final “combined” calibration plot?

See the April 24 post above. One approach is to create 40 calibration curves for 40 completed datasets, and to average the 40 curves. Each calibration curve is estimated by resampling on a single completed dataset.

3 Likes

I’m a little stuck on how to average the calibration curves. Is there a way to do it in the rms package?

You would have to program that averaging.

Hello! I have a question about rms package and lrm model. When I run validate (MyModel, bw=TRUE, B=400) I get the report “Frequencies of Numbers of Factors Retained”. How can I sum what exactly factors have been retained during bootstrap procedure? The factors are marked with asterisks, but I cannot count them.

v <- validate(..., bw=TRUE)
attr(v, 'kept')

Program the needed processing of kept.

1 Like

Greetings! Do parameters of the calibration plot (Slope, Dxy and others) belong to a nonparametric calibration curve or to the logistic calibration curve?

Somers’ D_{xy} is independent of calibration. Calibration slope and intercept come from assume a linear (in the logits) calibration curve and fitting it with logistic regression. Indexes starting with E come from the nonparametric curve. There are also integrated calibration measures from the nonparametric curve, in the literature.

1 Like

Nomogram with one predictor contributing very minimal
We developed a clinical prediction model with prespecified predictors and a full model and with enough effective sample size, with an optimism corrected AUC of .95. In our nomogram, one predictor contributes only minimal points, 2 or 3 points only and the beta is only zero. What to do? We have prespecified the full model. penalization done. I am tempted to remove that variable.
i would like to have your opinion

1 Like

It is most accurate to keep the weak predictor in the model and in its displayed nomogram. This will not affect the predictions so much but will affect what the nomogram doesn’t show—the precision (e.g., standard error) of estimates. An alternative is model approximation, also known as pre-conditioning. Use the full model as the basis for inference and standard error calculations but display the reduced model. For an example see this.

3 Likes

I have a model for which I have validated using optimism-adjusted bootstrap. However, we are interested in the performance of this model on subsets of the data (e.g. individual treatment arms). The reason for this inquiry is because treatment arm is the model’s strongest predictor and we are interested to know how well the model will perform within a given arm. Is there a way to apply the rms::validate function on selected subsets of the sample used to train the model?