Categorizing Continuous Variables

In this paper predicting the risk of cruciate ligament rupture based on time of neutering in Labrador Retreivers the following result is presented:

. Risk of CR was increased in dogs
neutered before 12 months of age (OR = 11.38; P
= .01). Neutering before 6 months of age was not
a significant factor (P = .17), nor was neutering be
tween 6 and 12 months of age (OR = 3.11; P = .23).
Overall neutering was also not a risk factor (OR = 1.8;
P = .27).

It makes no sense that there is an increased risk in dogs neutered before 12 months of age while not between at less than 6 months or between 6-12 months. Is this an issue with dichotomizing the month of neutering? Unfortunately the paper provides very little data to evaluate.
cruciaterupture.pdf (648.8 KB)

1 Like

This is a good example of misleading subgroup statistics after arbitrary categorization. Every analysis should start with (1) a high-resolution histogram of the data (here, age) to check regions of support, and (2) a smooth, non-overfitted relationship with uncertainty bands (using splines, nonparametric smoothers, fractional polynomials, etc.). To safeguard interpretations the uncertainty bands should be simultaneous compatibility (confidence) intervals.

1 Like

Can categorizing a continuous variable create Simpson’s Paradox?

Not sure but I think it’s possible that a form of the paradox could happen. The issue with the so-called paradox is the failure to condition on other relevant variables, and forcing something to be linear is similar to having omitted variables.

2 Likes

That is pretty weird. I don’t see a Data Availability statement in the paper, but what’s the ethos in the veterinary research community? Can you reach out to them to request data?

I think I might but it doesnt seem to be routine.

I would encourage that! Just thinking about this in terms of dummy variables, intuitively it doesn’t seem possible that the OR for the sum of 2 dummies (≤6mos + 6–12mos = ≤12mos) wouldn’t be some kind of weighted average of the OR’s for the individual dummies.