Alternative to logistic regression in case of non-linearity



Hello all

I am trying to asses the potential relationship between a continuous predictor on a binary outcome. However, after doing a scatter plot with a LOESS regression line and performing the Box Tidwell test, i clearly see that the increase in risk is not linear. It is « U » shaped.

What would be the next step from here?



This is going to require detailed study, as there are many issues underneath your question. See my RMS book and course notes. Here are a few issues to get you started in the process.

  • Logistic regression (and only in its default formulation) assumes predictor effects are linear on the logit scale, not on the risk scale.
  • Doing informal pre-modeling analyses will result in distortion of statistical inference later, i.e., making standard errors too small, confidence intervals too narrow, and p-values too small.
  • The best strategy is often to decide on how many parameters you can estimate for a predictor, and to fit a smooth nonlinear function (e.g., regression spline) with that many parameters, and to stick with it no matter what the fitted function looks like. In that way the degrees of freedom in the model are completely honest and all statistical inference will be preserved.


Thank you for the fast reply. This provides a good starting point.



Does this technique (and splines in general) play a role in causal modeling or are splines only useful for predictive models?


Splines are useful in any modeling context, to relax linearity assumptions in a way that reflects the shapes of effects we tend to see in practice.


Indeed very useful for confounder adjustment when the confounder is continuous. Categorizing the continuous confounder leads to worse confounder control than can be achieved by keeping it continuous and modelling it with splines or fractional polynomials.