This is a place for questions and discussions about the R `rms`

package and for archived discussions arising from Frank Harrell’s Regression Modeling Strategies full or short course and for regression modeling topics from the MSCI Biostatistics II course. New non-software questions and discussions about regression modeling strategies should be posted in the appropriate topic in `datamethods.org`

that has been created for each chapter in RMS. Links to these topics are found below. You can also go to `datamethods.org/rmsN`

where `N`

is the RMS book chapter number.

## Other Information

## R `rms`

Package Frequently Asked Questions

### Implementing and interpreting chunk tests / contrasts

Contrasts are differences in some transformation of predicted values—either a single difference or a series of differences that may or may not be multiplicity-adjusted (for confidence intervals). These are obtained with the `contrast.rms`

function (`contrast`

for short) and detailed examples may be obtained by typing `?contrast.rms`

at the console. Chunk tests are multiple parameter (multiple degree of freedom tests). There are two main ways to obtain these composite tests:

- Fitting a full model and a submodel and getting a likelihood ratio test (e.g., using the
`lrtest`

function) to test the difference between models, which tests for the joint importance of all the variables omitted from the smaller model. - Using the
`anova.rms`

function (short name`anova`

).

There are two main types of “chunks” used by `anova.rms`

:

- Testing the combined impact of all the components of a predictor that requires more than one parameter to appear in the model. Examples: combined effect of k-1 indicator variables for a k-category predictor; combined effects of linear and nonlinear pieces of a splined continuous predictor. This is done automatically by
`anova(fit)`

. - Testing the combined impact of multiple predictors. This is done by running for example
`anova(fit, age, sex)`

to get a combined test—a test of H_0: neither age nor sex is associated with Y. Any interaction terms involving age and sex are automatically included in the test as are any nonlinear terms for age.

### Difference between standard R modeling functions and the corresponding `rms`

functions — when to use one or the other?

`rms`

fitting functions exist for many of the most commonly-used models. Some of them such as `lrm`

and `orm`

are more efficient than standard R function counterparts. The `rms`

functions give rise to automatic tests of linearity from `anova`

and much easier-to-do model validation using resampling methods. `rms`

functions make graphics of predicted values, effects, and nomograms easier to make. They also allow you to use `latex(fit)`

to state the fitted model algebraically for easier interpretation.

Some of the fitting functions such as `lrm`

and especially `orm`

run faster than built-in R functions. `rms`

functions also have `print`

methods that include statistical indexes that are tailored for the particular model such as R^2_\text{adj} and rank measures of predictive discrimination.

Unlike standard R which uses `summary`

methods to get standard errors and statistical tests for model parameters, `print`

methods in `rms`

do that, and the `rms`

`summary`

methods (`summary.rms`

) computes things like inter-quartile-range effects and comparisons against largest reference cells for categorical predictors.

### The `datadist`

function — purpose and how it is used?

`datadist`

computes descriptive summaries of a series of variables (typically all the variables in the analysis dataset) so that predictions, graphs, and effects are easier to get. For example, when a predictor value is not given to `Predict()`

, the predictor will be set to the median (mode for categorical predictors). When a range is not given for a predictor being plotted on the x-axis to get predicted values on the y-axis, the range defaults to the 10th smallest to the 10th largest predictor value in the dataset. The default range for continuous predictors for getting effect measures such as odds ratios or differences in means is the inter-quartile-range. All the the summary statistics needed for these are computed by `datadist`

and stored in the object you assign the `datadist`

result to. You have to run `options(datadist=...)`

to let `rms`

know which object holds the `datadist`

results.

### Arguments and details of functions for missing data: `transcan`

and `aregImpute`

`transcan`

is used for nonlinear principal component analysis and for single imputation. `aregImpute`

is used for multiple imputation (preferred). There are many arguments and many examples of using these functions in their help files and also in case studies in the RMS text and course notes.

### When to use data.table vs standard R data.frames?

`data.table`

is highly recommended for a variety of reasons related to the ease of manipulating data tables in simple and complex ways. All data tables are data frames, so data tables are valid inputs to `rms`

functions.

### Working with labels and how they interact with `rms`

functions

`label`

s are typically specified with the `Hmisc`

`upData`

function or with `label(x) <- 'some label'`

. Labels are automatically created with various dataset imports. Labels are used for several `rms`

functions, primarily with axis labels. When a variable has a `units`

attribute in addition to a `label`

, the `units`

appear in a smaller font at the end of the label when forming an axis label. For some functions there is an option to use variable names in place of labels in the output.