Calibration of predicted full distribution percentiles

Suppose you predict the full conditional distribution of some non-zero continuous outcome as a complex non-linear function of predictors, and the predicted distribution does not have a tractable functional form. Suppose now that you want to generate a human-interpretable evaluation of the final model’s performance in predicting full distributions across a large number of predictor combinations. I propose that to measure moderate calibration you could plot the actual distribution percentiles against the predicted percentiles, and summarize that relationship using either a natural cubic spline or a simple linear regression. Alternatively, you could compute high-resolution (say, percentile-grain) histogram difference between the actual and predicted distributions. Are these valid calibration methods. Are there are human-interpretable methods to evaluate the moderate calibration of the final model that you might suggest? Thank you.

P.S. I should mention that the predicted distributions are estimated from a selected sample of data and distribution predictions might somehow be adjusted through post-stratification. The actual distributions would come from a population-level survey.

1 Like

This is a bit like an example I showed in the chapter on ordinal regression for continuous Y in RMS. I showed predicted vs. observed entire distributions but only for 6-tiles of predicted values. You might summarize the agreement using Kolmogorov-Smirnov-type absolute differences between smooth and empirical cumulative distribution functions.