So you've built a validated clinical prediction model, now what?



I wrote to Dr. Harrell directly and he suggested I post this question here. Please be gentle.

I’m starting to get a grasp on why ROC curve cutoffs (and just about any cutoff) are inadequate and I’m sold on predictive regression modeling.

So, where I’m curious is . . . what I’m calling the intersection of prediction and causality. After you build a model, and make a prediction, what guidance can the model give a clinician in terms of actionable information.

Say a patient comes in, and you’ve trained a valid and robust predictive model with the help of rms (both the package and book!). You, or the EMR system, enters the patient’s information into the model and they have a bad prognosis of disease X. You look at their information and you notice that their lab count Z is well outside of “normal range” (let’s say, the ones provided on a common lab sheet). For simplicity let’s assume lab count Z can be directly manipulated by drugs and not dependent, say, on a failing organ.

Okay, so here we are.

Can you:

  1. Use the model to see how much this lab value is affecting the prediction for the individual patient?

  2. Use the model to look at alternative scenarios for that individual patient (given the assumptions of the model, would getting their lab value back to a certain range, or increasing/decreasing it by an interval, affect their model-driven prognosis)?

  3. Use the model to define a new data-driven “normal range” for lab value Z for the “small world” of this sample? I could see the problems in this as perhaps no one in the sample (depending on the sample) has a “good” lab value for Z. Also, it’s a problematic overall average, but perhaps the clinician is nuanced and just considers it as a piece of evidence among many.

So, I love this quote from Dr. Harrell on his blog about his journey to Bayes. “Null hypothesis testing is simple because it kicks down the road the gymnastics needed to subjectively convert observations about data to evidence about parameters.”

So, here we are again, we know the patient has a bad prognosis and we now need to go from data observations to actionable information and we know we can’t rely on cutoffs/p-values for easy answers. In other words, we need to gamble (thank you Nate Silver) on a course of action with the patient. Can any of the numbered steps above provide valid intel/evidence to help the clinician make a good bet? Or should it all be instinct, training, gestalt from this point forward?

What would you do? Thanks!

Update - I see Dr. Harrell has written about this here a bit under “What is a good global strategy for making optimum decisions for individual patients?” I would love to see more extensive explanations, opinions, sources, and optimally a well-written book on this! Do such things exist?


Judea Pearl’s causal-statistical distinction gives an essentially negative answer to (what I take to be) your hopes here. See §§11.1.1 and 11.3.5 of his Causality 2nd ed. (2009); links here. This distinction is perhaps best crystallized in Pearl’s “Golden Rule of Causal Analysis” from p. 350:

No causal claim can be established by a purely statistical method, be it propensity scores, regression, stratification, or any other distribution-based design.


That’s a really good point, I should have used the phrase “might affect” a lot more in my question. I’m going to read over this Pearl passage, thank you very much!

I am likely expressing myself imprecisely here, but I’m hoping the clinician’s intuition and the statistically informed gambling intel somehow inform each other. At the very least, perhaps we can use the model to identify associated factors (and, yes, that means my example of the independent lab value was simplistic) and the clinician can make the determination/bet of causality. I am perhaps asking “what is the best statistical evidence we can give the doctor (outside of just the prediction itself) that could help guide the doctor in making an informed gamble.” I know statistics can’t give us a probability that lab value Z is causal.

Nonetheless, how dramatically prognosis shifts based on different values of lab value Z might suggest whatever is at the end of the causal chain could be worth finding. Maybe the clinician can decide whether they think a statistically associated variable might be causal or they have a hunch of where to find the real causal factor that determines the associated factor. You’d probably want the clinician to find out why platelets are low instead of just injecting some platelets directly. Could the statistician provide the clinician some evidence that suggests “hey, you might want to look into this or what is causing this.”

I’m trying to live in the world articulated by Richard McElreath in Statistical Rethinking:

“The large world is the broader context in which one deploys a model. In the large
world, there may be events that were not imagined in the small world. Moreover, the model
is always an incomplete representation of the large world, and so will make mistakes, even
if all kinds of events have been properly nominated. The logical consistency of a model in
the small world is no guarantee that it will be optimal in the large world. But it is certainly a
warm comfort.” pg. 19

So, where I’m confused is if this is categorically impossible, why are people trying? See Causal Inference in the Age of Decision Medicine (It seems to reference Pearl and uses Structural Equation Modeling).

And The Fundamental Difficulty With Evaluating the Accuracy of Biomarkers for Guiding Treatment :

“Therefore, guidance documents should not be asking for assessments of accuracy for predictive markers. In our view, instead they should be asking for assessment of the clinical impact of the markers on patient outcomes.

I know that Dr. Harrell and Dr. Senn have written how to supplement predictive models with RCT results to gauge the potential impact of Treatment A vs. Treatment B for the individual patient. I guess here since we’ve introduced an external intervention we know the difference in effect would be causal.

If it’s true that all a predictive model can do is predict/flag a patient and then the clinician takes over from there using training/instinct/gestalt, we’re being sold a lot of hype re: personalized medicine. Is there a compromise position in here somewhere? If there are 2500 patients in a hospital-based data system that have had a similar constellation of clinical indicators (and let’s say the clinician has only seen 3 of these patients over their career) can anything at all be learned from the 2500 that represents decent-to-solid intel?


What about instrumental variables/Mendelian randomization? Isn’t that a “purely statistical method”?


Nope. Consider what Pearl has to say on 2nd page of §11.1.1 linked above, and how this applies to the examples you cite:

Take the concept of randomization – why is it not statistical? Assume we are given a bivariate density function f(x,y), and we are told that one of the variables is randomized; can we tell which one it is by just examining f(x, y)? Of course not; therefore, following our definition, randomization is a causal, not a statistical concept.


I think this rabbit hole has perhaps just helped me better understand this article for the 12th read:

The best explanatory/causal model isn’t necessarily (probably isn’t) the best predictive model and vice versa. Maybe the question is, how can we conduct analyses of associations present in the rms-approved predictive model to best inform a subsequent causal analysis (or consulting with a clinician, or both).


I am a clinician, and I will give what I think is a common clinical answer. Viewed in one way, these are reasonable questions. Viewed in another, they are trumped by the practice of clinical medicine. Here are two examples of how that might work, one good one and another one.

Suppose the risk estimator is for coronary artery disease, and the high risk estimate is driven by a very high LDL cholesterol level. There are excellent, evidence-based authoritative guidelines based on large RCTs for survival that state that the recommendation is for statin therapy. It really does not matter what else the model might say. The hand of the physician is forced.

Suppose, on the other hand, the risk estimator is for sepsis, and the high risk estimate is driven by clinical suspicion for infection and sufficient risk factors as defined by the Sepsis-2 or Sepsis-3 criteria. Even though the authoritative guidelines are not evidence based, they say to treat with large volume infusion and with antibiotics. Here, it might be very interesting to explore the probability space afforded by the model because they might be higher or lower risk zones accessible by, say, giving Tylenol and reducing the fever (a Sepsis-2, not -3, thing). Even so – and I am not defending this - I do not think the busy clinician will have the least interest.

I think your alternative - everything now depends on instinct, training, gestalt – is always going to be what really happens. But I look forward to see what others think.


Thank you for the insightful (and honest re: practicality) answer! I think lurking beneath my question is “can we do this in a way that might be digestible for said busy clinician (and statistically defensible at the same time)?” but I hadn’t really considered that part of it. I’m in a part of Canada where folks aren’t yet experiencing EMR-related burnout to the same extent so I might have been naive there. :slight_smile: