I have a prediction model that outputs a probability that the result of an invasive test alters management, using predictors that are easily obtainable before a decision to perform the invasive test is made. Most of the predictor variables are continuous and I have included them as four knot restricted cubic splines. The model calibrates extremely well on bootstrapped internal validation. After external validation, I intend to provide the model as an online calculator.
I want to be able to warn users when the inputted data are at the extremes of the parameter space, i.e. when the particular combination of inputted parameters was extremely rare or never seen in the derivation set.
The easiest solution is to warn the user when individual predictors are at the extremes of the range in the derivation and validation datasets. This is in fact already implemented. But this does nothing for the problem of unusual combinations. One way is to provide the confidence interval of the predicted probability for the combination, which tends to be wide for unusual combinations, but I fear that clinicians will interpret the confidence intervals incorrectly and that this confusion will lead to wrong decisions, on average, given the result. Next I thought I could write a script that checks how many individuals in the derivation set had predictor values within a certain range of the inputted values, and the calculator could provide a statement such as: “Warning: ≤1% of individuals in the derivation set had a combination of values similar to those that were inputted”. However, I don’t like this approach both because the chosen range would always be arbitrary, and there is really nothing to say that a rare combination would result in invalid inference.
Is this a problem other groups have tackled and if so, what solutions were found?