Finding a cut-off for skewed biomarker in predicting death/recurrence

Dear all,

I am currently investigating the effect of a novel biomarker on recurrence/death following a malign disease. The biomarker in this case are titres of autoantibodies ( continuous and heavily skewed ).

First of all, to leave bias, I wanted to mathematically identify patients positive for the autoantibodies. I investigated it as a dichtomous variable. I applied the outlier criterion (P75+1.5IQR) to identify those “positive” for the autoantibodies. Also, I applied RCS regression (continuous).

I would also like to find a cut-off, above when the antibodies “start having an effect on prognosis” (I know it is not the most statistically correct approach). I know of ROC for calculating that, however this does not take censoring into account. Does it work with time dependent ROC analyses, like Heagerty proposed ? Or do I just simply look at the restricted cubic spline analysis and take the point where HR of 1 is crossed ?

Many thanks in advance.

Check out chapters 18 and 19 of Biostatistics for Biomedical Research on the risk modelling analysis alternative to your problem. So-called “cut points” will not reproduce and lose information contained in the data collected.

See also:


I concur with the - treat a continuous variable as a continuous variable first approach. If you know other variables associated with the outcome post the disease then you need to find out if the biomarker adds value to them as a prognostic marker. You may then want to create a multivariable prediction model - the predictions then become your new “biomarker”.

However, I understand the need for cut-offs when clinical decisions need to be made. I talk first to the clinicians about what rate of false negatives and/or false positives they would accept. This, of course, depends on the consequences & side effects of treating or not treating. This may yield something like a minimum sensitivity which may be used to identify thresholds. You will need to use bootstrapping or the like to estimate with confidence intervals a threshold. I do this by finding the threshold for which, say, there is a 98.5% sensitivity in each bootstrapped sample. This would mean that my lower bound of any confidence interval for a threshold will have >98.5% sensitivity.

1 Like

I think it’s dangerous to speak of cutoffs at the clinical decision point. Clinicians are used to looking at compromises when it comes to blood pressure, cholesterol, etc. Just because a marker is new doesn’t mean it should be treated differently than blood pressure. One way to understand that a cutoff should not be sought until possibly the very last second before the decision is made is that the cutoff mathematically must be a function of the continuous values of all the other risk factors. Another way to think about this is that if you push a cutoff on a biomarker you’ll need to measure more biomarkers to make up for the information loss.


R_cubed, Kiwiski and Prof. Harrell thank you all for answering!

Agree with this notion. As an example, in this design we minimized information loss by using continuous utility functions representing risk-benefit trade-offs used for decision-making. It is generally a good strategy to avoid dichotomization at the statistical estimation step, when calculating clinically relevant probabilities, and when accounting for trade-offs using utilities.

1 Like