Bayesian Biostatistical Modeling Plan

I’m a clinician, researcher, and statistics enthusiast. I try to be principled in my analyses, but since I don’t have a deep mathematical background, workflows help me to maintain good practice throughout a project.

I find @f2harrell’s body of work extremely helpful because of how principled it is. Frank recently published a Biostatistical Modeling Plan for frequentist prediction models on his blog, which is a nice distillation of his advice in his RMS course.

I’ve been looking for something similar for Bayesian prediction models. Frank made the marginal statement in his post “A different template would be needed for (the preferred) Bayesian approach.” I thought I would try my hand at modifying his plan to include the uniquely Bayesian aspects of modeling.

Gelman and others have published a few Bayesian modeling workflows recently. They are nice, but they are not specific for clinical prediction modeling, so I fear important things are missing or dangerous things are included. I tried to incorporate relevant parts from those workflows into Frank’s bulleted list. I also excluded things that seem less relevant to the Bayesian paradigm.

Below are some proposed modifications with references. The plain text on the numbered lines are from the original post. At the end, I list some of the prominent questions for me.

    1. multiply imputing Account for missing predictor values using posterior stacking to make good use of partial information on a subject
    1. Choosing an appropriate statistical model based on the nature of the response variable

    1. Specify prior distributions for parameters using scientific knowledge and conduct prior predictive simulations.
    1. Assess model performance by testing it on simulated data
    1. Deciding on the allowable complexity of the model based on the effective sample size available

    1. Allowing for nonlinear predictor effects using regression splines

    1. Incorporating pre-specified interactions
    • Note that it is better not to think of interactions as in or out of the model but rather to put priors on the interaction effects and have them in the model.

    1. Evaluate model diagnostics and address computational problems
    • Visualization in Bayesian Workflow Section 4 -
    • Bayesian Workflow Section 5 -
    • :new: MZ: For this point, I was thinking Rhats, ESS, trace plots, divergent transition issues, etc, to make sure that sampling went according to plan and the estimates are reliable. I moved the comments about decision curve analysis to point 12 because that addresses performance assessment and using the model to make decisions.

    1. Checking distributional assumptions (Bayesian additions)
    • In addition to residual analysis, etc:
      • Posterior predictive checking of distributional parameters orthogonal to the estimates (e.g. skewness when estimating mean in a Gaussian model) - Visualization in Bayesian Workflow Section 5 -
      • Posterior predictive checking of observed data - Bayesian Workflow Section 6.1 -
      • Check k-hats using PSIS-LOO - Visualization in Bayesian Workflow Section 6 -, Bayesian Workflow section 6.2 -
      • Example in rmsb package: bbr rmsb models
      • Instead of checking distributional assumptions and possibly getting posterior distributions that are too narrow from false confidence in the “chosen” model, allow parameters that generalize distribution assumptions, with suitable priors that make more assumptions for small N. For example, residuals can be modeled with a t distribution with a prior on the degrees of freedom that favors normality but allows for arbitrarily heavy tails as N \uparrow.

    1. Adjusting the posterior distribution for imputation
    1. Graphically interpreting the model using partial effect plots and nomograms
    • These displays were designed for using point estimates for predictions, and new ideas are needed for how to think of these instead in terms of posterior distributions of predictions.

    1. Quantifying the clinical utility (discrimination ability) of the model
    1. Internally validating the model using PSIS-LOO (??? assess calibration and discrimination of the model using the bootstrap to estimate the model’s likely performance on a new sample of patients from the same patient stream ???)
    • PSIS-LOO seems to be the preferred cross-validation method but I see it primarily used for model comparisons and to assist with checking distributional assumptions as above.
    • Such uses of LOO do not clearly yield discrimination and calibration metrics nor does it clearly assess over-optimism as in the bootstrap. Is there a role for bootstrap? Other sorts of metrics for over-optimism or performance?

    1. Possibly do external validation (?)
    • Another area of ignorance. Are there uniquely Bayesian concerns here?

    1. Prospective prediction
    • Taking discrete event risk prediction as an example, try to avoid using point estimates in making predictions, e.g., using posterior mean/median/mode regression coefficients to get point estimates of risk
    • Instead, save the posterior parameter draws and make a prediction from each draw, show the posterior distribution of risk, and possibly summarize it with a posterior mean

My open questions:

  1. What’s missing and what needs to be taken away or modified?

  2. Is there a good way to do prior predictive checking with rmsb? Seems like you can’t sample from the prior distribution in the model like you can with brms.

  3. How important is simulation and/or simulation-based calibration in point 4?

  4. What is the best way to justify the sample size for Bayesian models in point 5? Does the rule of thumb p = m/15 still apply?

  5. For point 9, how does one do posterior predictive checking safely? That seems like a risk for researcher-induced overfitting.

    • Suggested approach in Point 9 - Instead of checking distributional assumptions and possibly getting posterior distributions that are too narrow from false confidence in the “chosen” model, allow parameters that generalize distribution assumptions, with suitable priors that make more assumptions for small N. For example, residuals can be modeled with a t distribution with a prior on the degrees of freedom that favors normality but allows for arbitrarily heavy tails as N \uparrow.
  6. Questions in points 13, and 14 above.


I think a willingness to grapple with mathematical tools will pay large dividends as it will help in reading the foundational papers in Bayesian Decision Theory as well as information theory.

Quoting from Bayesian Analysis (Bernardo and Smith, 2004, p.67)

We have shown that the simple decision problem structure introduced … suffice for the analysis of more complex, sequential problems which appear, at first sight, to go beyond that simple structure. In particular, we have seen the important problem of experimental design can be analysed in the sequential decision problem framework. [my emphasis] We shall now use this framework to analyse the very special problem of statistical inference, [italics in original] thus establishing the fundamental relevance of these foundational arguments to statistical theory and practice.

I don’t think frequentist methods can truly be understood without understanding Bayesian Decision Theory. I deeply appreciate this perspective by Herman Chernoff in his comment on Bradley Efron’s 1985 paper “Why isn’t everyone Bayesian?”

With the help of theory, I have developed insights and intuitions that prevent me from giving weight to data dredging and other forms of statistical heresy. This feeling of freedom and ease does not exist until I have a decision theoretic, Bayesian view of the problem … I am a Bayesian decision theorist in spite of my use of Fisherian tools.

When you conceptualize your experiment or study with the goal of maximizing information (or designing a communication channel with the highest signal/noise ratio), things become clearer. Much of the advice in RMS can be understood from this point of view.

RE: Clinical utility of prediction models: search data methods for “decision curves” for the most rigorous evaluation of predictive models.

:new: RE: Missing Data and Imputation. Stef van Buuren has an online text on this topic. I think I found it in Frank’s notes or bibliographies somewhere.

Flexible Imputation of Missing Data (2018). Chapman Hall/CRC Press

See also:

:new: When thinking about workflows, Jeroen Janssens has published a freely available text on how to use traditional Unix/Linux command line tools as well as R. There is also a discussion of CLI machine learning tools.

The framework discussed has the acronym OSEMN (pronounced “awesome”):

  1. Obtain: (Study design and prospective data collection goes here).
  2. Scrub: (much of the work of “data wrangling” - ie. getting various data sources in a usable format)
  3. Explore: (looking at distributions, missing data, etc. Imputation could be done here).
  4. Model: (Computing likelihoods, posteriors, robustness checks, decision curves etc.)
  5. iNterpret: Draw conclusions and recommendations for practice and future research.

Building upon Shannon/Weaver theory of communication, I’d place any SAP (statistical analysis plan) on the encoding and decoding ends of the channel. At that point, we cannot increase any received information, but it is easy to lose it.

Related Threads

1 Like

@Mzobeck This is an absolutely wonderful start to a workflow/template for Bayesian clinical prediction model development and validation. I can’t thank you enough for making such a strong start to this process. I have turned your post into a wiki so that others can edit it directly and we can all work together to make it more complete.

@R_cubed your comments are excellent. For the concrete pieces that are especially appropriate for the workflow I would appreciate it if you and @Mzobeck could edit them into the workflow in the first post.

I just added a small section 12. on getting predictions that just points out that point estimates are not necessarily the way to go. I also slightly edited sections on imputation, interactions, adjusting covariance matrix for imputation, and partial effect plots/nomogram. :new: I added a piece about adding parameters instead of assessing goodness of fit.

This is so good to see!


Great list.

I think I’ve never seen a Bayesian medical article discussing prior predictive checks…

Great list!

I have two potential points to add to the discussion:

1) Bayesian model recalibration (model updating): in risk prediction, a middle ground between simple recalibration and complete model revision would be to use 2-component mixture priors for model updating, much like what’s been suggested for modeling pediatric trials: one component of the mixture prior is the posterior of the developed model itself and the other component is a potentially vague or skeptical prior. The mixture proportion \rho defines how much “forgetting” you will allow. If \rho =1 , the current posterior is completely ignored and you have complete model revision. If \rho=0, model updating only happens in case of strong disagreement between your current posterior and the validation data. In the pediatric trial analogy, the mixture proportion represents how much dependence you allow on adult data as in the slide below from this case study:

2) Bayesian Decision Curve Analysis: I am currently working on bayesDCA, an R package to do Bayesian DCA. It simplifies the model from Wynants et al. (2018) for single-setting case (i.e., not meta-analysis), uses conjugate priors for speed, and hopefully provides an easy-to-use interface. It allows you to calculate things like (i) arbitrary functions of net benefit from multiple decision strategies; (i) probability that a model is useful or the best; (iii) expected value of perfect information (i.e., the consequence of uncertainty in net benefit units). It is a work in progress so I’d love to hear your thoughts about the general idea (manuscript in preparation). I am aware of the friction between uncertainty quantification and expected utility maximization and am operating under the assumption that error bars don’t hurt but point estimates might (at least as long as NHST is not involved). Example output:



Both of these points are very interesting, @giuliano-cruz. I’ve been trying to integrate DCA into my own work. I look forward to seeing the package develop!

1 Like

Thank you @Mzobeck! I will post on Twitter once the bayesDCA preprint/paper is published. To be fair, the model updating approach is similar to dynamic bayesian updating (e.g., here), but the pediatric trial approach with mixture priors seems way more intuitive to me.

1 Like