What should MDs-in-training know about medical prediction?

Hello - thank you for your time. As part of a medical education pathway program at my univeristy, I will lecture my peers (medical students) on medical prediction. I have been studying this topic, although I am not expert. I have read “Clinicians’ Misunderstanding of Probabilities Makes Them Like Backwards Probabilities Such As Sensitivity, Specificity, and Type I Error” (http://www.fharrell.com/post/backwards-probs/). It seems that this is concerning and should be fixed, and I might have an opportunity to fix it, in a tiny way. Over 5-some years, I will lecture 500 students. Some of the things that I hope to cover, mostly from the mentioned blog post, deviate from what’s emphasized on board exams, but I think they’re too important to ignore (I might appreciate some reinforcement, if available, since I might feel somewhat unorthodox up there saying some of these things). If you would be amenable, I would also be interested to hear about anything else YOU might want MDs-in-training to know about medical prediction so they can better interact with and critically evaluate the tools that you build. Thank you again.

-Probability, conditional probability
-Decision theory (classification v prediction)
Careful of tools that embed decision thresholds, they might disempower patients and physicians
Ergo, for prediction, pay attention to calibration in place of metrics requiring thresholds
-Briefly, how are coefficients estimated
They depend on data — is the patient you’re treating similar to the data used to develop the risk score?
-Uncertainty of coefficient estimates, predictions
What happens with small sample sizes? What if papers don’t report this?
-Benefits of risk scores
Done right, they augment decisions for one patient with data from many others
Continually updating, more granular tools on the horizon
-derive “AI” model from linear regression
Important to show students, I think, that AI of today is not really “intelligent”
nonlinear methods can capture interactions, but rarely needed
-black box vs. explainable


A few extra things, some of which are basic but important:

  1. Principles of data entry. I’ve seen too many docs doing research who gather data without assigning unique IDS, without removing personal data such as names and addresses, and using Excel sheets with colour coding and multiple data fields per cell. No no no!!
  2. Reproducible research workflow. Basically even if they just read this commentary it would be an improvement on the status quo: https://www.nature.com/news/statistics-p-values-are-just-the-tip-of-the-iceberg-1.17412
  3. If they are planning a research project or grant application - speak to an epidemiologist or statistician at the planning stage, and not after they have already done the work collecting the data!
  4. Dichotomisation/categorisation of continuous variables throws away information and should almost never be done. Physicians love to do this because it lends itself to easy to use bedside heuristic rules to evaluate risk. But when we all have smartphones, this approach is out of date.
  5. Cluster analysis will find clusters because that is what they do … not necessarily because there are clusters. See commentary on recent diabetes cluster analysis here:

a number of Prof Frank Harrell’s recommendations surprised me eg Improving the Practice of Multivariable Prediction, section 3.11 “models need to be complex to capture uncertainty about the relations . . . an honest uncertainty assessment requires parameters for all effects that we know may be present. This advice is implicit in an antiparsimony principle often attributed to L. J. Savage ‘All models should be as big as an elephant’ (see Draper, 1995)” http://hbiostat.org/doc/rms1.pdf

edit: and to say that Frank Harrell doesn’t like categorisation is an under-statement. Fair enough. But how then to employ this prediction model in a clinical setting ie when hurriedly trying to make the calculation?

I think the argument will be that dichotomizing helps physicians get to the wrong answer quickly. For what it’s worth, I agree with you in that I’m not sure that the type of individual probabilities Frank Harrell advocates for are possible in situations that require truly near-instantaneous decision making (Resuscitation?), but in my field most of the biggest treatment decisions happen over a time period that allows for substantial discussion/involvement of family so a pre-made calculator would easily slide into those situations. Some that come to mind:

  • Does this baby have sepsis/do I need to start antibiotics?
  • When do I wean this baby of parenteral nutrition?
  • Is it better to give this baby steroids to get them off the ventilator now (increasing the risk of CP) , or do I give them a chance to get there on their own (risking chronic lung disease).
  • If this baby is born today, what is their probability of intact survival given active management?

Many people I know already use simpler calculators or regression models in these situations to provide parents with estimated probabilities, but these tools usually include a spattering of dichotomized continuous variables.

Come to think of it, neonatology would probably be a great field to look at how parent/physician decisions would change based on models built using continuous vs dichotomized variables and/or thresholds.


A large number of terrific questions have been raised here and I hope we get a lot of participation in this discussion from the clinician, statistics, and clinical epidemiology communities!

Just to address two of the important points, first I think that with modern computing, clinical informatics, and web computing tools, we very seldom need to worry about model complexity when getting predicted values. But even if predictions are needing to be done “by hand”, nomograms can go along way. Nomograms show how easy it is to handle continuous predictors and nonlinear predictor effects.

Secondly, Sam raises a very tricky and important point: do you teach methods that are outmoded just because they are going to be tested on the boards? This issue has troubled me for a long time. Here is a case in point. The way we teach probabilistic diagnosis, most MDs come to believe that you need to use Bayes’ rule even if the data come from a prospective cohort study, and they are led to believe that sensitivity and specificity are constant ‘test properties’. Neither of these is close to being true. We don’t teach probability to clinicians, and we really don’t teach this simple fact: if you know the patient characteristics that predict disease and you know the (even continuous) test result, you can easily compute the probability of disease in a direct fashion without taking the retrospective bypass just to be able to illustrate Bayes’ rule. How much more natural it is to just show probabilities such as these:

  • P(disease | test +, male)
  • P(disease | test -, female)
  • P(disease severity worse than 2 | ST change of x mm at a heart rate of 120)

Once a clinician understands conditional probabilities, and embraces these as being main tools in precision medicine, decision making improves and inaccurate shortcuts (e.g., assuming sensitivity is a constant, test results are binary, and disease is binary) can be dropped.

I think that Bayes’ rule is taught at this point in the medical curriculum because of a false perception that it simplifies things. It only simplifies things if you assume (1) disease prevalence is well-defined and known, (2) sensitivity and specificity are constant, and (3) both the test result and the disease status are binary. Prevalence is actually ill-defined in a futile attempt at getting an unconditional probability so a single number can be used. Every probability is conditional on something. Prevalence is made from an unknown mixture across some hypothetical “population.” And one can easily show that sensitivity increases with any patient characteristic that is correlated with underlying disease severity, as more severe disease is easier to detect. Better an inconvenient truth than an easy lie should instead be the mantra of medical education and board certification. In my opinion, the guiding principle should be optimizing medical decision making.


This is supposedly a course for Medical Students (had they wanted to become statisticians they wouldn’t be sitting there…).

So in my mind, clinical applicability is key- starting with clear examples of where incorrect statistical methods caused errors in medical decision making.
And then- solutions. Knowing that something is of limited use is NO USE if you cannot replace it by something better- in the end, medicine is always about decision-making, you cannot send your patient home just because you haven’t figured out the right stats tool…Medics don’t love categories because they are categories- but because they speed up decision-making. As someone already commented, you are not going to start a patient on a serious therapy for a light deviation and values are often discussed in series, so it’s looking at trends, not just absolute values.
So in the sense of evidence-based medicine :-), I think you would have to find examples where e.g. categorisation (as it happens in real life in the clinic, not in theory) gives you a considerably worse result than a different model and compare it in all possible effects- outcomes, time to get a decision (a good enough decision today is better than a perfect one next month), ease of use (as that will influence error rates) and resources needed.

Something that I see a lot (I am a medic and researcher by training but founded a patient network, so we see a lot of what patients are concerned about) is that medics are terrible at communicating probability to their patients.
In the end, as a patient, you are ONE point on any given curve, so your options are binary: you progress or you don’t. You live or you die. And yes, the probabilities say you are going to be more likely here than there- but in the end, it’s just that- a probability. Especially in rare conditions telling patients that something isn’t going to happen…doesn’t work well- they already got hit by something rare once before.

Good luck!

1 Like

this is a decent paper with an illustrative example: “Dichotomizing continuous predictors in multiple regression: a bad idea.” [1]. The authors say: “there is a real point in creating risk groups from such a model—not least, as an aid to making clinical decisions about therapy. Accordingly, we prefer first to derive a continuous risk score from a model in which all relevant covariates are kept continuous, and then to apply categorization at the final step. Patients are divided into several groups for clinical application by applying cutpoints to the risk score.” Perhaps this is the way to go. Although I think Frank Harrell has said the cut-points are often not reproducible.


Thank you and will have to read again when less tired ;-)!

The interesting point for me is is the junction between model and decision-making as in the end, Medicine is applied.

We will always have the ‘yes, sure’ and the ‘no, sure’ situations- it’s the area in between that is tricky. And now with decision-making that is increasingly driven by financial decisions…important to get it right.

The simplest thing to know about decision making is that it is suboptimal if any categorization is done before the decision point. See the diagnosis chapter in BBR.


Thank you!! Perfect and matter of fact just what I need to learn- so bear with me while I get my head around it- so if I understand you correctly, you take all (known) relevant risk factors without categorisation on continuous variables, calculate a risk score and only categorise at the decision point.

How do you deal with series of decisions then please? And how do you ‘back and forward’ in there? (One of) the situation(s) I am worried about is the following:
in Melanoma, some countries try to get the ‘false negatives’ of excisions of suspected skin lesions as close to zero as possible. That however means that only skin lesions that are OBVIOUSLY malign are excised- often, after patients have been going back and forth for ages asking for removal, resulting in higher stages at diagnosis and worse prognosis. The entire thing is obviously short-term cost-driven- so decision-making based on a risk-profile rather than ‘this looks horrendous’ plus ‘I have already exhausted my number of allowed excisions’ would be helpful. And how could I go then ‘backwards’- in the sense of ‘when all your pathologies come back positive, you are likely to have missed x lesions?’ Or is this just the thing that you don’t want…?
After lesion removal, then there is inconsistency who gets offered a SNLB- another decision point- and now, treatment (adjuvant, neo-adjuvant), so one ends up with a series of decisions and I can see that wrong categorisation has down-stream consequences. Is there a way to mitigate for that? Like arguing for ‘yes- no and more intense follow-up for the intermediate category’ instead of simply ‘yes-no’?

Apologies if this is confuse, I’m just starting to think about it.

p.s. I shall borrow that- though I’ll add ‘a lack of strategy of identifying questions of real interest and relevance’

‘We have more data than ever, more good data than ever, a lower proportion of
data that are good, a lack of strategic thinking about what data are needed to
answer questions of interest, sub-optimal analysis of data, and an occasional
tendency to do research that should not be done.’

1 Like

Great questions. I don’t have experience in the “long sequence of decisions with updated patient condition and changing patient preferences” setting. It’s worth deciphering the simple single-time-point decision first though. I have some relevant references in the BBR chapter on diagnosis. Work by Andrew VIckers and Ewout Steyerberg especially comes to mind.

The optimal Bayes decision, which maximizes expected utility (minimizes expected loss) is the gold standard approach. This takes into account all uncertainties in the risk estimates. Once you define the loss function there is a way to optimize the decision process. WIthout a loss function, we are more stabbing in the dark. Ideally the loss function is defined by the patient but it can be a medical cost function.

The approximation to the full Bayes solution is to consider the risk estimates to be made without uncertainty. Then, as a paper reference in BBR shows, you can solve for a risk threshold for treatment. If every patient has the same loss function, this threshold will be the same for every patient.

I hope someone can point us in the right direction for more complex multi-stage decisions.

1 Like

@f2harrell Thanks, I am glad for the confirmation that it is appropriate to teach the posterior probability, P(dz+| characteristics). My hope was to start by explaining P(dz+) (however hypothetical*) and then move to P(dz+|characteristics), involving first fractions and then logistic regression.

My original intention was to take students from the (known) concepts of sens/spec to PPV using Bayes rule, and then finally to P(dz+|anything), but I thought I would lose many of them in the process. That journey, I think, is an undertaking. Perhaps for this reason, materials on these topics for med students kind of skimp. Maybe I am wrong, but it seems that PPV and NPV are usually taught as rote formulas, void of probabilistic context.

Hopefully, if conditional probability is taught, the awkwardness of P(test+|dz +) will be immediately clear, and everyone will notice that a quarter of a doctor’s job is to estimate P(dz+|test+), which is really P(dz+|test+, male, age=51), etc… Also, we can extend: given that P(dz+|test+, male, age=51, Treatment+) = 0.4 or P(side effect+|test+, male, age=51, Treatment+) = 0.8, what does the patient want to do? (Another quarter of a doctor’s job.) This might also increase the general enthusiasm for (unaltered) probability estimates.

*perhaps if we perfectly randomly sampled 10000 people and saw how many had the disease?

@bryll How to better train MDs to communicate probabiliites?

1 Like

@jroon Thank you for these suggestions; I will try to incorporate as much as possible. About 1. I agree that data entry is important for research, but many MDs might not perform research at all. However, pretty much all of them will enter data into the EHR. The EHR is known to present difficulties in terms of data quality. Can MDs reduce these quality issues somehow, making it a better research resource for all?

1 Like

Interesting question. I don’t have alot of experience with EHR’s - most hospitals here still on paper charts which has its own set of problems (hence the dodgy excel files I mentioned). EHR data entry problems are presumably tied into the user interface of each different EHR, of which I have little experience - I think others can maybe answer this better!

I’d like to learn how to optimally teach probability. One authority is David Spiegelhalter. I’d also like recommendations on small books concentrating on probability.


I would like to learn how to optimally communicate probabilities. We tend to communicate risk in a heavily value laden manner that often boils down to “this should be done” or “this shouldn’t”.


To me the main question about that is whether it is productive to take the retrospective sens/spec detour or go right to conditional probabilities (after a brief into to an unconditional probability). One good example might be to show the probability that a man with an enlarged prostate will be biopsy-positive for prostate cancer, as a continuous function of his age.

1 Like

I think risk communication needs its own topic. I’ll start one.


Thank you, this was my question as well. Sen/spec etc. will be taught in a lecture before. I am not involved in this directly, but might be able to advocate for a more probabilistic treatment there, including Bayes rule.

In my lecture on prediction, teaching Bayes rule as it relates to sen/spec, which requires explaining how, eg, PPV = f(sens,spec)*Prior, would take time. It might be an opportunity to teach probability and reinforce the concepts of Sens/spec, but P(dz | …) seems most relevant to prediction (where … is something more than test +). If the predictive system provides P(dz | …), it should be sufficient.

I worry about “predictive” systems that provide classifications. These must be flipped:
P(dz | classier +) = P(classier + | dz) P(dz) / P(classifier +). This is probably not something the physician has time to do (especially specifying a prior). Ironically, classifiers, which are usually intended to speed up workflow, thus make it drastically more inefficient. Drawing attention to the fact that P(dz | test +) is the same as P(dz | classifier +) might help students deal with classifiers, but this seems defensive. Will classifiers even make their way into clinical practice enough to necessitate this?

Rethinking this… apologies. I am not too familiar with risk adjustments based on treatment, but wanted to include it as a motivator. Assume here that disease is an event such as stroke that we would like to prevent with treatment. We would like to know how much treatment will affect risk before administering Treatment, so it does not make sense to condition on a hypothetical Treatment, I think. More explicitly, if we have treatment effect (TE) from a RCT, we want something like Risk If Treated = P(stroke | male, age) – TE, I believe? Or more precisely TE (male, age)…