Best smoothing methods for probability prediction calibration curves

When drawing the calibration curve of a probability prediction model, I often observe wildly different results when I use LOWESS vs. a Gaussian additive model with smoothing splines vs. a binomial generalized additive model (GAM) with smoothing splines. Among these, the binomial GAM seems the most intuitive to me because I see to flexibly predict a binary outcome (actuals) as a function of the predictor. What principles should we use when deciding how to draw a smooth calibration curve?

This is cross-posted to Cross-Validated:


I have found the best performance using a flexible parametric calibration curve, i.e., by fitting a restricted cubic spline function with 5 knots in logit of predicted risk.

The best paper that studies nonparametric calibration curve estimation is this.