Regression Modeling Strategies: General Aspects of Fitting Regression Models
This is the second of several connected topics organized around chapters in Regression Modeling Strategies. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.
Overview | Course Notes
While maybe not the sexiest part of RMS, apprehension of notation can be especially important for accessing important RMS concepts and maneuvers, such as chunk tests involving interactions, and interpreting and expressing effects. If you are new-ish to modeling, skipping over notation might not actually save time and effort.
The concept of the regression function is a function that describes interesting properties of Y that may vary across individuals in the population. C’(Y|X) denotes some function for a property of the distribution of Y conditional on X, expressed as a weighted sum of the X's (the linear predictor).
Interaction (aka, “effect modification”) indicates that the effect of two variables cannot be separated: i.e, the effect of X_1 on Y depends on the level of X_2, and vice versa. This makes it difficult to interpret \beta_1 and \beta_2 in isolation. Importantly, rather than just a difference in slopes, interaction may take the form of a difference in shape (or distribution) of a covariate-response relationship conditional on another variable.
Almost every statistical test is focused on one specific pattern to detect; and therefore inference from every statistical test should be appropriately qualified by this specific pattern.
One of the most valuable maneuvers for general practice is the use of regression splines to relax linearity for continuous covariates. Its low prevalence in general practice does not seem proportionate to its merit. For folks new to rms::rcs()
----or now, rms::gTrans()
— etc, don’t get intimidated or frustrated by sections 2.4.2 through 2.4.6; just pragmatically emulate FHs practice in the examples (and see links below), and let full understanding follow in good time. ‘Trust in the Force’ of these magical functions: Use splines to avoid categorizing continuous variables or imposing naive linearity.
Categorizing continuous variables leads to loss and distortion of information. Categorizing continuous variables does violence to statistical operating characteristics and is unnecessary. (Don’t do it!)
Likewise for chunk tests (Section 2.6) — the low prevalence in practice does not seem proportionate to its merit. Emulating FHs practice in evaluating complexity of effects (interactions, and spline terms) or redundancy (groups of covariates, etc) has mitigated much statistical complication.
A reasonable approach for modeling and evaluating complex interactions is for each predictor separately, to test simultaneously the joint importance of all interactions involving that predictor.
Additional links
- General Multi-parameter Transformations in rms
- Restricted cubic splines, A flexible method for fitting regression lines, by Peter Flom
- An Introduction to the Harrell“verse”: Predictive Modeling using the Hmisc and rms Packages, by Nicholas Ollberding
- An exercise in non-linear modeling, by Max Gordon
RMS2