When drawing the calibration curve of a probability prediction model, I often observe wildly different results when I use LOWESS vs. a Gaussian additive model with smoothing splines vs. a binomial generalized additive model (GAM) with smoothing splines. Among these, the binomial GAM seems the most intuitive to me because I see to flexibly predict a binary outcome (actuals) as a function of the predictor. What principles should we use when deciding how to draw a smooth calibration curve?

This is cross-posted to Cross-Validated: https://stats.stackexchange.com/questions/424203/best-model-for-probability-prediction-calibration-curves