In veterinary medicine I frequently see papers that use biomarkers such as neutrophil:lymphocyte ratio for diagnostic or prognostic purposes. I recall reading that there are statistical problems with this approach. One I can see is that it actually loses information due to turning two continuous variables into a single dichotomized variable. Are there any other concerns that I can point out to my colleagues?
The papers I’ve seen (e.g. the classic Richard Kronmal paper) relate to specific statistical modeling issues / assumption violations. In terms of pure measurement issues, ratios can work, if you can show that an ultimate outcome can be modeled as y = f(log(r)) + g(d) + h(n) for ratio r with all denominator d coefficients and all numerator n coefficients equal to zero. This takes a lot of luck. Quite often transformations of the numerator or denominator need to be taken before computing the ratio. Sometimes the transformation is just a change of origin (subtraction) in the denominator.
If the two biomarkers are well correlated then it makes little sense to use ratios (or multiplying them together as I’ve seen).
I would love to see discussion on this. As a clinician, ratios of biomarkers make intuitive sense to me for specific contexts. For instance, neutrophil to lymphocyte ratio for bacterial lower respiratory tract infection (for which, at least theoretically, higher neutrophils tend to correlate with a bacterial cause and lower lymphocytes with viral illness) and urea to creatinine ratio, because for renal causes of ureamia, they tend to rise together but in upper gastrointestinal bleeding, urea tends to rise much steeper than creatinine.
I have often wondered what are the statistical downsides, above and beyond theoretical issues. One consideration I had that I am not sure how relevant is, is that the ratio presumably multiplies the measurement and analytical error of both biomarkers. Unfortunately, I don’t think I understand Harrell’s point but I will read up on it and discuss here if and when I do.
One example where ratios badly capture the information: polymorphonuclear leukocyte counts in predicting bacterial meningitis. It was “known” clinically that the % of leukocytes that were polymorph was the important feature. It’s not at all; the absolute polymorph count is the thing.
For whatever reason, that example of the use of a ratio isn’t at all clinically convincing for me a priori. Of course that could be because I am young and was never exposed to the incorrect dogma.
Other ratios still intrigue me and I would love links or any and all papers or discussions that review their downsides.
From these references some potential problems: Ratios only “normalize” when the relationship passes through the origin. A ratio Y/X controls for X only if Y vs. X is a straight line through zero. For NLR and most biomarker ratios, this assumption is biologically implausible and rarely tested — so the ratio fails at its stated purpose.
-
Mathematical coupling creates spurious associations. Even when numerator and denominator are independent, the ratio is correlated with the denominator by construction. Any “association” between NLR and an outcome may be partially an artifact of this coupling rather than real biology.
-
Ratios conflate distinct underlying realities. A group difference in NLR is consistent with at least three different scenarios — different intercepts, different slopes, or genuine ratio differences in the N–L relationship — and the ratio cannot distinguish among them. ANCOVA or unconstrained regression can.
-
The ratio imposes an untested functional-form constraint. Using log(N/L) forces β_neutrophil = −β_lymphocyte. Fitting log(N) and log(L) as separate predictors lets the data show whether that constraint holds — and a likelihood ratio test can confirm or reject it.
-
Measurement error and dichotomization compound the damage. Noisy lymphocyte counts (especially at low values) destabilize the ratio precisely in the lymphopenic patients where high NLR is considered diagnostic. Subsequent dichotomization at data-derived cutpoints loses additional information and biases effect estimates.
References
Aitchison J. The Statistical Analysis of Compositional Data. Chapman & Hall, 1986.
Allison DB, Paultre F, Goran MI, Poehlman ET, Heymsfield SB. Statistical considerations regarding the use of ratios to adjust data. Int J Obes. 1995;19:644–652.
Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. Chapman & Hall, 1995.
Curran-Everett D. Explorations in statistics: the analysis of ratios and normalized data. Adv Physiol Educ. 2013;37(3):213–219.
Kim JS. Spurious correlation between ratios with a common divisor. Stat Probab Lett. 1999;44:383–386.
Kronmal RA. Spurious correlation and the fallacy of the ratio standard revisited. J R Stat Soc A. 1993;156(3):379–392.
Pearson K. Mathematical contributions to the theory of evolution — on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond. 1897;60:489–498.
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006;25(1):127–141.
Tanner JM. Fallacy of per-weight and per-surface area standards, and their relation to spurious correlation. J Appl Physiol. 1949;2(1):1–15.
NLRpaper.pdf (3.2 MB) In this paper they look at the NLR ratio but not lymphocytes and neutrophils separately which seems incomplete