Optimising models for calibration

samw235711 · October 12, 2025, 12:47pm

I think there are a few interesting things that emerge here, as they relate to calibration.

We have a difference between a model for diagnosis and a model for prognosis, the difference between threshold-based and non-threshold based decision making, the difference between randomized and observational data, and how statistical inference relates to all of it. Overall, it seems that the question on calibration involves all of these things, which makes it particularly difficult to answer.

In the original example, we have a sort of treatment decision (whether to discharge or not) in the setting of a diagnostic problem (whether there is an heart attack). This can be conceptualized, though, with a prognostic model. Suppose we have, for example, information corresponding to an EKG. The treatment decision D is whether to discharge. Let S be a binary RV for 10-year survival. We need to maximize its expectation, E(S|ekg) = P(S=1|ekg)=\sum_dP(S=1|D=d,ekg)\pi(D=d|ekg)P(ekg). We can do this wrt \pi. We do not need to estimate the diagnostic model, P(HEART ATTACK |EKG). Further, the prognostic model, P_n(S=1|D=d,EKG) has to be well calibrated, as in the orange juice example, within some region that is defined by the data.

One could focus not on discharge but instead on catheterization, though, and then it is more like the diagnostic problem, where the action is whether or not to do a test (which aligns with whether or not to do a prostate biopsy).

I preface the next paragraph: I wrote it entirely based on this discussion here, before reading Against Diagnosis (maybe I read part of it many years ago).

We almost always assume that if we can make the diagnosis, heart attack, we are done: then, we just have to treat. It varies by problem,* but especially with medical conditions that don’t fit into neat diagnostic boxes, this is incorrect. Rather than worry about diagnosis, we are sometimes better off spending our energy thinking about what to do given the information that we have. Then, we can focus on the uncertainty about what will happen after we do it. This is what really matters.

The treatment problem is similar I think (maybe the same) as what is called the prediction problem in Against Diagnosis. However I would propose to solve it by finding a policy that maximizes expected utility. In my mind, in this way, with the prognostic orange juice example, and the EKG example above, we make a decision without a threshold.

In my experience, discussing risk estimates can be challenging. For example, with statins, instead of finding a policy that maximizes utility, providers often estimate 10-year ASCVD risk, and if it is greater than or equal to 10%, they recommend a statin. This is the official guideline. This is problematic, in my opinion. It assumes, among other things, that any two patients experience cardiovascular events and statin side effects in the same way. It also assumes that the information in the ASCVD covariates is sufficient to make a decision. This is not what is recommended in Against Diagnosis, I realize. Reading it, I also realize that the ASCVD-based approach is an improvement of how things would otherwise be.

With respect to inference, I agree. For example, the variance of the posterior of risk might increase as we approach a risk of 1, especially if 10-year survival is rare. Ultimately there is an expectation over utility and then an outer expectation over the risk model. Not an easy problem.

*For many medical problems, one is done when one makes the diagnosis, but this may be less about reality and more about the medical community’s historical focus on diagnosis (which may be related to antibiotics).