Examples of clinical prediction rules which perform better without dichotomisations



Sounds like a great topic for a formal decision analysis. Have you ever seen one done?


Michael Rosenberg makes great points - like I said in my initial reply there are many scenarios where we have to think in broad categories for the sake of practicality and quite frankly there is so much imprecision in medicine and people’s values and preferences are very vague.

As for a commonly used model based on continuous variables - the kidney failure risk equation is used routinely.

But look where it is used: in an outpatient clinic where you have time to stay and play and discuss the implications of dialysis and so on.

Oncology also has numerous.


Just on that point, categorization only makes measurement error worse, because errors around thresholds get turned into major errors.


Hi all, new to the forum. I came across this discussion via a tweet from Cecile Janssens and have really enjoyed reading along.

Full disclosure: I’m a clinical geneticist and seriously out of my depth in terms of statistics, so I will likely learn much more from this discussion than I can contribute.

However, perhaps one reason for dichotomization in medicine is how we communicate with patients. I think there is a huge challenge with communicating quantitatively and patients often expect answers that are black and white (“positive/negative” “at risk/not at risk,” “sick/not sick,” “normal/abnormal”) even though those simplifications lose a great deal of important information. Some of that might be learned – if physicians are educated to think about dichotomous states, and talk to patients that way, patients come to expect our answers to be yes/no instead of something more nuanced. We come across this all the time in Clinical Genetics, and patients often have a difficult time understanding the nuances of relatively straightforward genetic diagnostic testing (for Mendelian/monogenic disorders) let alone the more probabilistic information associated with complex diseases.


I’m glad you’ve joined this discussion Jonathan. While I don’t have any real data on this, my experience is that this varies greatly by physician and patient. Some variables such as BMI, blood pressure, and especially height and weight are very commonly treated as continuous most of the time. Even LDL cholesterol is commonly dealt with optimally in the context where the physician knows that somebody’s arbitrary LDL threshold has been given, and the physician is smart enough to ignore this threshold when the patient has a favorable profile on other risk factors and his LDL is just over the threshold. It would be nice if someone’s done a formal review of clinical practice on this point.


That is certainly right, and I think most people “get it” that high blood pressure is bad for them, and higher blood pressure is worse (or cholesterol, or BMI, etc.). Maybe the difficulty is in communicating exactly how much a given increase or decrease in those values would increase or decrease their risk for a particular health outcome, which is the important thing after all. I completely agree that we would ideally want to be able to provide an accurate risk assessment based on the various inputs, contextualize which of those inputs are modifiable, and predict what would be the net effect of changing any one or more of those inputs.


Nicely put again. Best decisions come from knowing the utility function and having the best estimate of likely future outcomes (e.g., risk estimates under the various treatment scenarios). Risk estimates needs to extract as much information from patient characteristics as is feasible, without any loss of information through categorization. Full Bayesian decisions take this even further but formally taking into account uncertainties about the risk estimates. And especially if the uncertainties are asymmetric (e.g. more uncertainty on the high risk end than on the low risk end) the optimum decision can be different from using the risk point estimate.


ASCVD calculator is probably the most common example in primary care. I remember the good old days when I used simple LDL cut-offs. Now I have to think a bit more (a good thing).

I do think their are places in clinical practice where dichotomy may have value. But their is rarely ever a reason for clinical trials to incorporate dichotomies by design. They should always have higher standards, so we can make more informed clinical choices.


I’d like to see an example. And note that even if there is an example that is compelling on its face, you can show mathematically that most of the time the dichotomization required inclusion of an additional variable to make up for information lost - a predictor that would not have added diagnostic or prognostic value had the first variable been maximally utilized continuously.


I used to think that many of my clinical decisions were dichotomous, i.e. give antibiotic or not give antibiotic. But on reflection, I found that reality is more nuanced. Maybe I wait a day before writing the script. Maybe I order an extra test first to help my decision. etc. At the point, it seems like I am making a single yes/no decision, but take a step back and many more variable are in play.

The area where I sometimes use dichotomies is during patient communication. As an example, when deciding when to prescribe statins many patients do better with “probably will-help”/“probably won’t-help” recommendations as opposed to probability of potential benefit (which I also give anyway). Taking into account patients knowledge base, my ultimate goal is informed consent to allow patients to make autonomous decisions.

I suspect @f2harrell will have some disagreements, and I welcome others thoughts on the topic. Always great to discuss and learn more.


I resonate with that. I think you are dichotomizing, when you really need to, at the post-data assimilation stage, so you are not dichotomizing inputs, only the summation of inputs.


You don’t carry a mobile phone? There is a plethora of apps/websites for various risk scores that take minimal time to use.

Paracetamol overdose. It is a time sensitive and common ED presentation that uses continuous variables to guide treatment decisisons in the form of a nomogram and has been in use going back decades: https://www.rcem.ac.uk/RCEM/Quality-Policy/Clinical_Standards_Guidance/RCEM_Guidance.aspx?WebsiteKey=b3d6bb2a-abba-44ed-b758-467776a958cd&hkey=862bd964-0363-4f7f-bdab-89e4a68c9de4&RCEM_Guidance=6

Whilst drug dosing is not overtly a prediction algorthim/decision tool in that the decision to treat has already been made, most pediatric drug dosing, even (especially) in emergency cases, requires calculating drug doses based on weight at time of use. Same for adult oncology drugs and things like low molecular weight heparin, EPO, immunosuppressants. Even an antibiotic like vancomycin is dosed based on weight and then subsequent concentration measurements (https://metrosouth.health.qld.gov.au/sites/default/files/msh-vancomycin-guidelines.pdf) - which while a heuristic is based on pharmcokinetic calculations. i.e. continuous variables made into a heuristic. (indeed look at that link - its a complex heuristic. Plugging values into and app could be easier to use with alot less time spent than reading the heurisitc!)

So physicians can and do perform time of use calculations based on continuous variables - even in emergent situations, all of the time, day in, day out. The question is then, why do they do it for some things and not for others? Probably a combination of necessity (i.e. in pediatrics you simply cannot ignore body weight when writing a prescription and the adult drugs I mentioned all have narrow therapeutic indexes), matching previous trial designs for comparability, tradition/habit and perhaps a lack of awareness of alternatives in some cases.


If anyone has good nutrition/anthropometry examples, there is a special issue of the British Journal of Nutrition that seems to be looking for relevant examples. Trust me when I say this is a world that could really use some input on this topic.

“The use of cut-offs to define nutritional status has many historical and programmatic purposes. For example, using height for age or weight for height cutoffs can be used to assess the prevalence of a nutritional deficiencies or evaluate program effectiveness through changes in trends of various conditions. However, defining nutritional status, such as stunted, underweight, or overweight using programmatic cut-offs has become a norm for research in fetal and early childhood growth and may not always be the most rigorous means of studying factors or interventions that influence growth. In some instances, using standard cutoffs limits conclusions that may be of great scientific or public health importance, but are missed due to small sample sizes or effects. Thus, it is of interest to both academics and policy professionals to discuss potential alternatives to standard cutoffs, such as varying cutoffs or using continuous variables, in growth and nutrition research. This special issue of BJN will invite papers from leading scientists and program professionals who develop and/or use cutoffs for nutritional status in their work to present some of the rationale for continuing to use traditional cutoffs or innovative alternatives that better inform the fields of growth, nutrition, and public health.”