Survival curves - categorization vs continuous in cox model

Hey there,
I have very fresh data from my research project, where I have encountered a very exciting phenomenon.
In the following, I have plotted the survival curves (Kaplan-Meier curves), dividing my target variable values into tertiles.

As you can easily see, the effect in the top tertile is visible compared to the two other tertiles, while the bottom and middle tertiles do not differ that much.

If I now calculate a multivariable Cox regression and use my target variable as continuous, I get no visible (or stat. significant) effect. I hope I am right when I explain this by the fact that there are no differences in the lower and middle tertile and therefore there is no effect in one direction.

Now I have thought about including my target variable as a categorized-tertile-variable in the multivariable Cox model. This way I would see exactly the effect of the survival curves.

BUT so far my guiding principle, which I have read extremely often, has been that you should partouly not categorize continuous variables.

So what can I do here alternatively to prove the effect not only by showing survival curves?
Or am I mistaken here and the effect may well not be “detectable”? (my event number should actually be enough)

Many thanks already for any help and support.

Did your Cox regression allow for nonlinearity in your continuous predictor of interest?

1 Like

At the moment I just added the predictor of interest (PI) as log(PI) cause of hardly skewed distribution. Nothing more. So you think of modelling with spline or something? (I don’t have any experience until yet with it, just read about it in F. Harrell’s Book)

Yes you need to relax the linearity assumption.

Tertiles are demographic ideas and not biologic or physiologic. Once you have stratified by tertiles it is hard to unsee the graph. It should have never been drawn. It is misleading, arbitrary, and unreliable.


Thanks for your help! After using splines, I now ended up with a quadratic term which fits very good and yields to good model fitting results.

How did you arrive at a quadratic term? Did you start witih a (seldom used) quadratic spline?