Alternative to logistic regression in case of non-linearity

WBeaubien · August 15, 2018, 1:13am

Hello all

I am trying to asses the potential relationship between a continuous predictor on a binary outcome. However, after doing a scatter plot with a LOESS regression line and performing the Box Tidwell test, i clearly see that the increase in risk is not linear. It is « U » shaped.

What would be the next step from here?

Cheers!

f2harrell · August 15, 2018, 2:45am

This is going to require detailed study, as there are many issues underneath your question. See my RMS book and course notes. Here are a few issues to get you started in the process.

Logistic regression (and only in its default formulation) assumes predictor effects are linear on the logit scale, not on the risk scale.
Doing informal pre-modeling analyses will result in distortion of statistical inference later, i.e., making standard errors too small, confidence intervals too narrow, and p-values too small.
The best strategy is often to decide on how many parameters you can estimate for a predictor, and to fit a smooth nonlinear function (e.g., regression spline) with that many parameters, and to stick with it no matter what the fitted function looks like. In that way the degrees of freedom in the model are completely honest and all statistical inference will be preserved.

WBeaubien · August 15, 2018, 11:52am

Thank you for the fast reply. This provides a good starting point.

Best,

mwebb · August 22, 2018, 8:14pm

Does this technique (and splines in general) play a role in causal modeling or are splines only useful for predictive models?

f2harrell · August 22, 2018, 8:34pm

Splines are useful in any modeling context, to relax linearity assumptions in a way that reflects the shapes of effects we tend to see in practice.

Pavel_Roshanov · August 23, 2018, 5:15am

Indeed very useful for confounder adjustment when the confounder is continuous. Categorizing the continuous confounder leads to worse confounder control than can be achieved by keeping it continuous and modelling it with splines or fractional polynomials.

EpiLearneR · June 29, 2021, 2:29am

Dear prof Harrell,
When I was discussing this , our stat friends say it would be difficult to interpret the results if I use the splines.I am at a loss to how to answer that question.

f2harrell · June 29, 2021, 12:36pm

Much of what’s in Regression Modeling Strategies is aimed at fighting that false perception. Such discussions often get off on the wrong foot when an analyst mentions coefficients. The coefficients are not to be interpreted. The function as a whole is to be interpreted, and this is easily done with a partial effect plot. This is a line plot that happens to be nonlinear, and everyone is familiar with interpreting line plots. There are also many examples of the use of nomograms to represent such relationships, including 1-axis nomograms that show x vs. f(x). If you think of particular problems in interpreting such graphs please respond here and we can keep working on it.