In short: a bayesian binary predictor can provide a posterior on the predicted probability, providing showing a point prediction (e.g. this patient has 3-22% risk) instead of just providing point predictions (this patient has 4% risk). Can such an interval be grounded and verified by actual data?
Background
Consider binary prediction. x\in \mathbb{R}, y\in \{0,1\}. We would like to predict y from observed x, and have a sense of uncertanity. Plenty has been written on the subject, see e.g. A survey of uncertainty in deep neural networks | Artificial Intelligence Review for a somewhat recent review.
For many reasons, we want to have an (approximate) bayesian solution, so that we can put informative priors and integrate with other models in a bayesian way. To this end, have deep learning model produces two outputs: a(x) and b(x). This provides the approximate posterior on the probability as a \theta(x) \sim Beta(a(x),b(x)) and finally the predictor is \hat{y} \sim Bernoulli(\theta(x)). We can train the model using the ELBO as in standard variational inference, and we can provide different priors a_0(x), a_1(x) giving us some regularisation and interpretation.
This approach is presented in the review above as a âsingle determistic methodâ and has been presented in the litterature, generalized by having categorical labels and Dirichlets/Categoricals instead of Beta/Bernoulli. Previous work suggest that the distribution for \theta(x) can be used to assess model certanity. If the total concentration a(b)+b(x) is small, there is supposedly more model uncertanity. We can present a 95% credible interval to illustrate this for a model user.
The questuion
If I say to a patient âThis patient has a risk in the credible interval 3-22%â how can I validate that this interval has a grounding in actual model performance? Would the patient-specific risk really fall in this interval with some rate? Can I monitor my model for underconfidence/overconfidence? Does it correspond to anythin that can be empirically tested?
It seems to me there is no way to do this. Is there some literature or other point I have missed?