Categorizing Continuous Variables

In this paper predicting the risk of cruciate ligament rupture based on time of neutering in Labrador Retreivers the following result is presented:

. Risk of CR was increased in dogs
neutered before 12 months of age (OR = 11.38; P
= .01). Neutering before 6 months of age was not
a significant factor (P = .17), nor was neutering be
tween 6 and 12 months of age (OR = 3.11; P = .23).
Overall neutering was also not a risk factor (OR = 1.8;
P = .27).

It makes no sense that there is an increased risk in dogs neutered before 12 months of age while not between at less than 6 months or between 6-12 months. Is this an issue with dichotomizing the month of neutering? Unfortunately the paper provides very little data to evaluate.
cruciaterupture.pdf (648.8 KB)

This is a good example of misleading subgroup statistics after arbitrary categorization. Every analysis should start with (1) a high-resolution histogram of the data (here, age) to check regions of support, and (2) a smooth, non-overfitted relationship with uncertainty bands (using splines, nonparametric smoothers, fractional polynomials, etc.). To safeguard interpretations the uncertainty bands should be simultaneous compatibility (confidence) intervals.

1 Like

Can categorizing a continuous variable create Simpson’s Paradox?

Not sure but I think it’s possible that a form of the paradox could happen. The issue with the so-called paradox is the failure to condition on other relevant variables, and forcing something to be linear is similar to having omitted variables.

1 Like