Is there a way to display the reference category(ies) in a regression output in rms?

Andrzej_Andrzej · February 1, 2024, 5:29pm

Hi,
this is a bit of basic question based on this:
https://stackoverflow.com/questions/70459515/is-there-a-way-to-display-the-reference-category-in-a-regression-output-in-r

Is it possible to have a reference level(s) included in ols() function output table ?
Is it possible to have an output sort of like in SPSS ?
In R regression table output, a reference level of categorical variable(s) is hidden/omitted, in SPSS is included and visible:

kind regards, and thank you

MSchwartz · February 2, 2024, 4:28pm

Hi,

I would say that first, comparing R output to SPSS output, in terms of the esthetics, can be problematic. In many cases, there are philosophical differences in terms of how both applications have been implemented, which is reflected in the default output, and that would be applicable to other statistical applications (e.g. SAS) as well.

Even within R, as an example, the ‘signifcance stars’ that you have in the right hand table, as indications of the magnitude of statistical signifcance, are rarely used in my experience, and in R, there is a way to disable that output globally, which I did over two decades ago by adding:

options(show.signif.stars = FALSE)

to my .Rprofile file, which is read by R at startup. Those stars, in the end, are a distraction to the reader, and a historical remnant.

That being said, including the reference levels in the tables seems to be an inefficient use of space. As was noted in the SO thread that you reference, the coefficients for the other levels are relative to the reference level, when using the default treatment contrasts, and the intercept for the model embeds additional information.

If I am reading the output of the SPSS table correctly, it would also appear that SPSS uses the last level of the factor variable as the reference level, rather than the first as done in R. As a result, you have different outputs in the two tables relative to the model coefficients and other values.

When I am presenting such a table in a report, I will add footnotes below the table to indicate the reference levels for the categorical (factor) variables, so that the reader is aware of these, without taking up space in the main body of the table.

Andrzej_Andrzej · February 2, 2024, 4:57pm

Hi and thank you very much for your reply.

This can be changed in options, default is highest level but not a problem to change it.
What I have in mind is how to include in RMS (ols()) output table reference levels like in SPSS meaning Gender_N=1 or group=4. This is hidden in R. SPSS just adds almost empty row hence I know what is reference. Instead of 0 in B column (SPSS) which could be a bit misleading in R, NAs could be added. How to add it to ols() output ?
I have read this:
https://stackoverflow.com/questions/50781110/extracting-reference-level-from-glm-coefficients
There is a function tidy_coefs_with_ref, but I do not know how to modify it to work with ols() models/output ?
I know that I can write it down in captions but this is not what I want here.

MSchwartz · February 2, 2024, 6:59pm

Hi,

I don’t use any of the “tidyverse” functionality, so I can’t help you with the particular function, or perhaps relevant modifications to it. That function appears to have been coded to work with glm() derived objects in R, not lm() or ols() objects, and there are subtle and not so subtle structural differences in them, depending upon the complexity of the model being created.

One step to try, at least initially, is to see if that function will work with your model having been created using the standard R function lm() as opposed to Frank’s ols() function. That might give you a hint as to a direction to take here.

In general, to get the reference levels for any factors in the model, you would need to find out where those values are stored in the resultant model object, which in the case of the standard lm() and glm() functions in R, is in the “xlevels” part of the returned model object and that is referenced in the code for the function in the SO thread.

That is MOD$xlevels, where MOD is the returned lm() or glm() model object, and then process that accordingly to insert those values into the rows of the standard coefficient matrix as desired.

Using the standard R model functions, you would get the matrix of coefficients and associated statistics using coef(summary(MOD)), again where MOD is the returned model object.

You would then want to use the relevant print function to output that modified matrix, which in the case of standard lm() output is going to be the printCoefmat() function. Otherwise, you would potentially need to write your own print function to output the modified matrix in a formatted manner, which just adds to the amount of work that you would need to do here.

From a quick test using ols() on one of the model examples in lm(), where there is a factor used, it would suggest that the relevant factor level information is stored in MOD$Design$parms of the returned ols() model object, where there are sub-elements for each factor in the model formula.

If that is correct, it would explain why that function in the SO thread does not work, since the model object structure returned from ols() differs materially from lm(), and I would envision that is the case to support other functionality in Frank’s rms package, such as penalties and so forth.

The above might give you enough hints as to how to modify that SO function, but you may have to dig a bit deeper there, if you wish to pursue that path.

Andrzej_Andrzej · February 2, 2024, 9:39pm

That was helpful, thank you. Unfortunately, I am not that experienced user to modify functions but I will try to experiment with code anyway.

MSchwartz · February 2, 2024, 10:23pm

Andrzej,

If you need general R programming assistance, I would recommend that you post back to SO, likely as a new thread, given the age of the older threads that you referenced.

You would avail yourself of a wider audience there, and where there is a higher level of traffic regarding general R programming queries.

What you are asking for is arguably atypical, and would require extracting and modifying parts of standard R objects in a manner that can automate the formatting and display of the reference levels.

Ideally, that would be done such that it is compatible with existing R output (print method) functions, so that you can use existing functionality. Otherwise, you would also have to create custom functions that would format and output your modified objects.

f2harrell · February 2, 2024, 10:32pm

In future this topic belongs under RMS Discussions - modeling strategy - Datamethods Discussion Forum