Presentation of the results of a prediction model

I have a prediction model that outputs a probability that the result of an invasive test alters management, using predictors that are easily obtainable before a decision to perform the invasive test is made. Most of the predictor variables are continuous and I have included them as four knot restricted cubic splines. The model calibrates extremely well on bootstrapped internal validation. After external validation, I intend to provide the model as an online calculator.

I want to be able to warn users when the inputted data are at the extremes of the parameter space, i.e. when the particular combination of inputted parameters was extremely rare or never seen in the derivation set.

The easiest solution is to warn the user when individual predictors are at the extremes of the range in the derivation and validation datasets. This is in fact already implemented. But this does nothing for the problem of unusual combinations. One way is to provide the confidence interval of the predicted probability for the combination, which tends to be wide for unusual combinations, but I fear that clinicians will interpret the confidence intervals incorrectly and that this confusion will lead to wrong decisions, on average, given the result. Next I thought I could write a script that checks how many individuals in the derivation set had predictor values within a certain range of the inputted values, and the calculator could provide a statement such as: “Warning: ≤1% of individuals in the derivation set had a combination of values similar to those that were inputted”. However, I don’t like this approach both because the chosen range would always be arbitrary, and there is really nothing to say that a rare combination would result in invalid inference.

Is this a problem other groups have tackled and if so, what solutions were found?

1 Like

I think the confidence intervals should be presented, but carefully so as to not hurt interpretation. As to the approach of estimating the representation of a given subject in the model development sample, the Hmisc dataRep function can help.


In our biomarker based scores for patients with atrial fibrillation (the ABC-AF stroke and ABC-AF bleeding scores) we have limited the input based on the distribution in both the development and the validation data. Limits were not formally defined from percentiles in the distribution but rather as a compromise between looking at extreme values and judgement by clinicians. As well, for the overall predictions we looked at the calibration curves (both internal and external) and only allowed predictions within a range where the models were reasonably well-calibrated. Results outside that range are reported as, e.g., “ABC-bleeding risk >15%” or “ABC-stroke risk <0.2%”. We have a preliminary version of the risk calculators on our homepage abc-risk-calculators where you can try different extreme combinations to see the reporting.


I’m not sure the patient is best served by suppressing predictions when X values are out of range.

When placing limits they should not be based on quantiles but on absolute sample size.

Maybe wasn’t clear enough, but predictions are not suppressed for extreme X values. Rather, extreme X values were deemed “implausible” (especially for some biomarkers) and truncated to the highest (lowest) allowed value and predictions were carried out from that. So, e.g., for age >95 years the age was set to 95 years before making the prediction, avoiding extrapolation outside the support for ages above 95 years in the development (and validation) data.

Regarding truncation of the (reporting of) final predictions, the decision was made based on where we had support for reasonable calibration. Also, the truncation point for the predictions was way higher than any reasonable clinical decision point. E.g., the trunction for stroke was at >10% one-year risk (OAC treated patients) where anything above 3% is usually considered rather high.

1 Like

Ah I see. That’s quite reasonable. I often recommend curtailing of covariate values.

1 Like

Thank you for this response and the reply to Dr. Harrell’s comment, could you elaborate a bit on how you did this? I read the publications in the linked calculator, but couldn’t find a description.

If I understand correctly you allowed users to input biologically plausible values into the calculator, but truncated the values to an extreme percentile of the distribution of that variable in the derivation and validation cohorts? I.e. if age 99 was entered, 95 years was inputted into the model and a warning was issued to the user. Did you extend this concept to unusual combinations of variables? As a clinician I would find it extremely unlikely that someone with a high hs-troponin would have a low NT-proBNP, did you somehow modify the entered values based on the percentiles of the joint distribution of the inputted variables?

Finally, I am unsure how you would go about allowing only predictions within a range where the model was well calibrated when the prediction is the output of the product you are providing? Did you simply not provide the answer if it was outside the well calibrated range? I can understand modifying the entered values based on extremes of the distributions of the variables, but surely you could never modify them based on the output?

1 Like

That’s correct.

No, we didn’t consider joint distributions but that’s a very interesting idea. The challenge would of course be to decide on the bivariate limits. One possibility could be to allow any combination of values but set up some soft limits issuing a warning such as “are you really sure about these values?”.

That is correct, we calculate the predictions based on (possibly truncated) input values but refrain from reporting values outside well-calibrated range. That is, if the prediction is 12% stroke risk we report it as “>10%”. So, no entered values are modified retrospectively, if that is what you meant? Again, in our application, these limits are far from being close to any relevant clinical decision point so it feels safe to just report extreme predictions as above (or below) the limit.


Well, I hate to upset the apple cart, but here’s an argument against confidence intervals for predicted probabilities:

1 Like

Mike this is behind a high profit margin company’s paywall. Just making trouble :slight_smile:

Ah, no wonder it is never cited LOL! Well, it is a hypothetical conversation between a doctor and a patient. The gist of it is something like this:

Patient: Doc, what are my chances?
Doctor: If I had 100 men identical to you, I would expect
about 70 to be alive 5 years later. I have no idea whether you
will be one of the 70.
Patient: Thanks, that is pretty clear.
Doctor: By the way, the 95% confidence interval for this
prediction is 62% to 78%.
Patient: What does that tell me?
Doctor: It means that I don’t know exactly how many of the
100 men will be alive 5 years later, but I expect between 62
and 78 will. My best guess is still 70, but this is a range for
my guess.
Patient: Does that mean at least 62 but no more than 78 will
be alive?
Doctor: No, but fewer than 62 or more than 78 is quite
Patient: Where does the 95% come in?
Doctor: The casual interpretation is that there is only a 5%
chance that fewer than 62 or more than 78 men will be alive
in 5 years. Technically, that’s not correct. Instead, all I can
say is that if I repeated my work of estimating that interval
an infinite number of times, 95% of my infinite attempts
would successfully include the true risk that you face. Of
course, I can’t make a definitive statement about the
particular interval I just quoted for you.
Patient: This is not very helpful to me as an individual.
Moreover, I think my true risk is 0 or 1. You just can’t predict
it that well.
Doctor: Your point is well taken. Although the casual
interpretation is not correct, just stick to it instead.
Patient: Why did you pick for me the interval where it is
less than 5% likely that my 100 identical patients would not
lie? Couldn’t you have picked a different one?
Doctor: I chose 95% because the scientific community
generally does. My guess is that you don’t care about that, so
we can pick any interval you want. How about 90%?
Patient: Seems pointless to me because only the casual yet
incorrect interpretation begins to make sense to me. I want
to know if I will be alive. You think, out of 100 men identical
to me, about 70 will be alive. You acknowledge more or
fewer than 70 may be alive, which was obvious without
your confidence interval. And 70 remains your best guess,
regardless of the interval you provide for me. So, I feel like
the 70 out of 100 estimate sufficiently conveys the extent of
uncertainty you have regarding my prognosis in this single-
event context.
Doctor: I agree. Sorry for wasting your time with the confidence interval.

1 Like

Perfect! Smart patient.

Funny enough, that patient told me that everything he even pretends to know, he learned from you, the GOAT!

1 Like