I’m a clinician, researcher, and statistics enthusiast. I try to be principled in my analyses, but since I don’t have a deep mathematical background, workflows help me to maintain good practice throughout a project.
I find @f2harrell’s body of work extremely helpful because of how principled it is. Frank recently published a Biostatistical Modeling Plan for frequentist prediction models on his blog, which is a nice distillation of his advice in his RMS course.
I’ve been looking for something similar for Bayesian prediction models. Frank made the marginal statement in his post “A different template would be needed for (the preferred) Bayesian approach.” I thought I would try my hand at modifying his plan to include the uniquely Bayesian aspects of modeling.
Gelman and others have published a few Bayesian modeling workflows recently. They are nice, but they are not specific for clinical prediction modeling, so I fear important things are missing or dangerous things are included. I tried to incorporate relevant parts from those workflows into Frank’s bulleted list. I also excluded things that seem less relevant to the Bayesian paradigm.
Below are some proposed modifications with references. The plain text on the numbered lines are from the original post. At the end, I list some of the prominent questions for me.


multiply imputingAccount for missing predictor values using posterior stacking to make good use of partial information on a subject
rmsb
package notes RMS course notes
 Both
rmsb
andbrms
R packages support MICE  Handle Missing Values with brms • brms  Try to use full Bayesian models so as to not require imputation


 Choosing an appropriate statistical model based on the nature of the response variable
 Choosing an appropriate statistical model based on the nature of the response variable

 Specify prior distributions for parameters using scientific knowledge and conduct prior predictive simulations.
 Priors based on scientific knowledge:
 Statistical Rethinking Chapter 4
 Statistical Rethinking 2023  Lecture 03  YouTube
 BBR notes Section 6.10.3  Biostatistics for Biomedical Research  6 Comparing Two Proportions
 “How to” for rmsb  bbr rmsb models
 Prior predictive simulation:
 Statistical Rethinking Chapter 3 and Lecture 4 as above
 Visualization in Bayesian workflow section 3: https://arxiv.org/pdf/1709.01449.pdf
 Bayesian Workflow Section 2.4  https://arxiv.org/pdf/2011.01808.pdf

rmsb
doesn’t seem to have an easy way to do this right now

 Assess model performance by testing it on simulated data
 Bayesian Workflow Section 4.1  https://arxiv.org/pdf/2011.01808.pdf
 Statistical Rethinking’s “Owldrawing workflow” steps 14  Statistical Rethinking 2023  Lecture 03  YouTube
 Fancier simulationbased calibration (? not as familiar and unclear how useful for prediction)  Bayesian Workflow Section 4.2  https://arxiv.org/pdf/2011.01808.pdf

 Deciding on the allowable complexity of the model based on the effective sample size available
 Deciding on the allowable complexity of the model based on the effective sample size available

 Allowing for nonlinear predictor effects using regression splines
 Allowing for nonlinear predictor effects using regression splines

 Incorporating prespecified interactions
 Note that it is better not to think of interactions as in or out of the model but rather to put priors on the interaction effects and have them in the model.

 Evaluate model diagnostics and address computational problems
 Visualization in Bayesian Workflow Section 4  https://arxiv.org/pdf/1709.01449.pdf
 Bayesian Workflow Section 5  https://arxiv.org/pdf/2011.01808.pdf

MZ: For this point, I was thinking Rhats, ESS, trace plots, divergent transition issues, etc, to make sure that sampling went according to plan and the estimates are reliable. I moved the comments about decision curve analysis to point 12 because that addresses performance assessment and using the model to make decisions.

 Checking distributional assumptions (Bayesian additions)
 In addition to residual analysis, etc:
 Posterior predictive checking of distributional parameters orthogonal to the estimates (e.g. skewness when estimating mean in a Gaussian model)  Visualization in Bayesian Workflow Section 5  https://arxiv.org/pdf/1709.01449.pdf
 Posterior predictive checking of observed data  Bayesian Workflow Section 6.1  https://arxiv.org/pdf/2011.01808.pdf
 Check khats using PSISLOO  Visualization in Bayesian Workflow Section 6  https://arxiv.org/pdf/1709.01449.pdf, Bayesian Workflow section 6.2  https://arxiv.org/pdf/2011.01808.pdf
 Example in
rmsb
package: bbr rmsb models  Instead of checking distributional assumptions and possibly getting posterior distributions that are too narrow from false confidence in the “chosen” model, allow parameters that generalize distribution assumptions, with suitable priors that make more assumptions for small N. For example, residuals can be modeled with a t distribution with a prior on the degrees of freedom that favors normality but allows for arbitrarily heavy tails as N \uparrow.

 Adjusting the posterior distribution for imputation
 If multiple imputation was done (as opposed to full Bayesian modeling), posterior stacking makes the posterior distributions appropriately wider to account for uncertainties surrounding imputation.
 Stef van Buuren (2018) Flexible Imputation of Missing Data. Chapman Hall/CRC Press https://stefvanbuuren.name/fimd/stefvanbuuren.name/fimd/
 A Bayesian Perspective on Missing Data Imputation  Yi's Knowledge Base
 Missing Data, Data Imputation  missingdata

Donald B. Rubin (1996) Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91:434, 473489, DOI: 10.1080/01621459.1996.10476908
https://www.tandfonline.com/doi/abs/10.1080/01621459.1996.10476908  Question for @f2harrell: where would imputation best fit in a Bayesian workflow? Would it be fair to call the method described in Yi Zhou’s post a form of model averaging?

 Graphically interpreting the model using partial effect plots and nomograms
 These displays were designed for using point estimates for predictions, and new ideas are needed for how to think of these instead in terms of posterior distributions of predictions.
*

 Quantifying the clinical utility (discrimination ability) of the model
 Andrew Vicker’s papers on Decision Curve Analysis:
 Decision curve analysis: a novel method for evaluating prediction models  PMC

Decision Curve Analysis

 Internally validating the model using PSISLOO (???
assess calibration and discrimination of the model using the bootstrap to estimate the model’s likely performance on a new sample of patients from the same patient stream???)
 PSISLOO seems to be the preferred crossvalidation method but I see it primarily used for model comparisons and to assist with checking distributional assumptions as above.
 see Aki Vehtari’s case studies and FAQ about Cross validation  Crossvalidation FAQ
 Bayesian Workflow Section 5  https://arxiv.org/pdf/1507.04544.pdf
 Such uses of LOO do not clearly yield discrimination and calibration metrics nor does it clearly assess overoptimism as in the bootstrap. Is there a role for bootstrap? Other sorts of metrics for overoptimism or performance?
 Internally validating the model using PSISLOO (???

 Possibly do external validation (?)
 Another area of ignorance. Are there uniquely Bayesian concerns here?

 Prospective prediction
 Taking discrete event risk prediction as an example, try to avoid using point estimates in making predictions, e.g., using posterior mean/median/mode regression coefficients to get point estimates of risk
 Instead, save the posterior parameter draws and make a prediction from each draw, show the posterior distribution of risk, and possibly summarize it with a posterior mean
My open questions:

What’s missing and what needs to be taken away or modified?

Is there a good way to do prior predictive checking with
rmsb
? Seems like you can’t sample from the prior distribution in the model like you can withbrms
. 
How important is simulation and/or simulationbased calibration in point 4?

What is the best way to justify the sample size for Bayesian models in point 5? Does the rule of thumb p = m/15 still apply?

For point 9, how does one do posterior predictive checking safely? That seems like a risk for researcherinduced overfitting.
 Suggested approach in Point 9  Instead of checking distributional assumptions and possibly getting posterior distributions that are too narrow from false confidence in the “chosen” model, allow parameters that generalize distribution assumptions, with suitable priors that make more assumptions for small N. For example, residuals can be modeled with a t distribution with a prior on the degrees of freedom that favors normality but allows for arbitrarily heavy tails as N \uparrow.

Questions in points 13, and 14 above.