RMS Describing, Resampling, Validating, and Simplifying the Model

Regression Modeling Strategies: Describing, Resampling, Validating, and Simplifying the Model

This is the fifth of several connected topics organized around chapters in Regression Modeling Strategies. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.

Overview | Course Notes

This is a key chapter for really understanding RMS.

Section 5.1.1, pp.103

  • “Regression models are excellent tools for estimating and inferring associations between X and Y given that the “right” variables are in the model.” [emph. dgl]
  • [Credibility depends on]…design, completeness of the set of variables that are thought to measure confounding and are used for adjustment, … lack of important measurement error, and … the goodness of fit of the model.”
  • …“These problems [interpreting results of any complexity] can be solved by relying on predicted values and differences between predicted values.”

Other important concepts for the chapter:

  • Effects are manifest in differences in predicted values
  • The change in Y when Xj is changed by an amount that is subject matter relevant is the sensible way to consider and represent effects
  • 5.1.2.1 Error measures and prediction accuracy scores include (i) Central tendency of prediction errors, (ii) Discrimination measures, (iii) Calibration measures.
  • Calibration-in-the-large compares the average Y-hat with the average Y (the right result on average). Calibration-in-the-small (high-resolution calibration) assesses the absolute forecast accuracy of the predictions at individual levels of Y-hat. And calibration-in-the-tiny (patient specific).
  • Predictive discrimination is the ability to rank order subjects; but calibration should be a prior consideration.
  • “Want to reward a model for giving more extreme predictions, if the predictions are correct”
  • Brier Score good for comparing two models but absolute value of Brier score is hard to interpret. C-index OK for evaluating one model, but not for comparing models (C-index not sensitive enough).
  • “Run simulation study with no real associations and show that associations are easy to find.” The Bootstrapping Ranks of Predictors (rms.pdf notes section 5.4 and figure 5.3 is an elegant demonstration of “uncertainty”. This improves intuition and reveals our biases toward “over-certainty.”

—Drew Levy

Additional links

RMS5

Q&A From May 2021 Course

  1. Brier Scaled as an alternative to Brier Score for an easier interpretation? Extract from the Steyerberg (2010) manuscript (DOI: 10.1097/EDE.0b013e3181c30fb2).

The Brier score is a quadratic scoring rule, where the squared differences between actual binary outcomes Y and predictions p are calculated: (Y-p)^2. We can also write this in a way similar to the logarithmic score: Y x (1- p)^2 + (1-Y) x p^2. The Brier score for a model can range from 0 for a perfect model to 0.25 for a noninformative model with a 50% incidence of the outcome. When the outcome incidence is lower, the maximum score for a noninformative model is lower, eg, for 10%: 0.1 x (1 - 0.1)^2 + (1 - 0.1) x 0.1^2=0.090. Similar to Nagelkerke’s approach to the LR statistic, we could scale Brier by its maximum score under a noninformative model: Brier scaled = 1 – Brier/Brier max, where Brier max = mean (p) x (1 - mean (p)), to let it range between 0% and 100%. This scaled Brier score happens to be very similar to Pearson’s R2 statistic

  1. Do you have a good source for interpreting R2 type measures in nonlinear context? Say for dose-response? Papers by Maddala et al in econometrics.

  2. Can we use the validate() function in rms to get optimism corrected c-index? Yes; back-compute from Dxy.

  3. I wonder why RMSE hasn’t been mentioned as a measure for model fit –which is often used in ML. It’s fine, for continuous Y, and is related to R^2.

  4. Should we add model-based description to the other 3 regression strategies? Yes

  5. Does anyone have a reference for Steyerberg’s paper studying p > 0.5 not being quite so damaging for variable elimination from a model? I think the RMS book has the reference.

  6. For the losers vs winners example, seems like you are comparing a model with 10 df for the winners with a model with 990 df with the losers. Would R^2 be a fair comparison between these two models? Yes if the 990 model is penalized or the R^2 is overfitting-corrected.

  7. Do you know of any literature that highlights the issues with the Charlson Comorbidity index? We use it a lot and I’d like to have a reference to bring to my supervisor if possible! (Unsure where this question should be placed, but we brought it up in the validation section :slight_smile: ). See the Harrell Steyerberg letter to the editor in J Clin Epi.

  8. :question: Would you recommend bootstrapping ranks of predictors in practice instead of FDR corrected univariate screening?

  9. :question:We mentioned today that c-index performs well in settings with class imbalance, but I found a paper that tries to argue that AUROC is suboptimal in the case of class imbalance (The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets, Saito & Rehmsmeier 2015). Am I missing something basic here where c-index is different from AUC?