Bayesian Biostatistical Modeling Plan

Mzobeck · January 27, 2023, 6:06pm

I’m a clinician, researcher, and statistics enthusiast. I try to be principled in my analyses, but since I don’t have a deep mathematical background, workflows help me to maintain good practice throughout a project.

I find @f2harrell’s body of work extremely helpful because of how principled it is. Frank recently published a Biostatistical Modeling Plan for frequentist prediction models on his blog, which is a nice distillation of his advice in his RMS course.

I’ve been looking for something similar for Bayesian prediction models. Frank made the marginal statement in his post “A different template would be needed for (the preferred) Bayesian approach.” I thought I would try my hand at modifying his plan to include the uniquely Bayesian aspects of modeling.

Gelman and others have published a few Bayesian modeling workflows recently. They are nice, but they are not specific for clinical prediction modeling, so I fear important things are missing or dangerous things are included. I tried to incorporate relevant parts from those workflows into Frank’s bulleted list. I also excluded things that seem less relevant to the Bayesian paradigm.

Below are some proposed modifications with references. The plain text on the numbered lines are from the original post. At the end, I list some of the prominent questions for me.

1. ~~multiply imputing~~ Account for missing predictor values using posterior stacking to make good use of partial information on a subject
- rmsb package notes
- RMS course notes
- Both rmsb and brms R packages support MICE - Handle Missing Values with brms • brms
- Try to use full Bayesian models so as to not require imputation
1. Choosing an appropriate statistical model based on the nature of the response variable
1. Specify prior distributions for parameters using scientific knowledge and conduct prior predictive simulations.
- Priors based on scientific knowledge:
  - Statistical Rethinking Chapter 4
  - Statistical Rethinking 2023 - Lecture 03 - YouTube
  - BBR notes Section 6.10.3 - Biostatistics for Biomedical Research - 6 Comparing Two Proportions
  - “How to” for rmsb - bbr rmsb models
- Prior predictive simulation:
  - Statistical Rethinking Chapter 3 and Lecture 4 as above
  - Visualization in Bayesian workflow section 3: https://arxiv.org/pdf/1709.01449.pdf
  - Bayesian Workflow Section 2.4 - https://arxiv.org/pdf/2011.01808.pdf
  - rmsb doesn’t seem to have an easy way to do this right now
1. Assess model performance by testing it on simulated data
- Bayesian Workflow Section 4.1 - https://arxiv.org/pdf/2011.01808.pdf
- Statistical Rethinking’s “Owl-drawing workflow” steps 1-4 - Statistical Rethinking 2023 - Lecture 03 - YouTube
- Fancier simulation-based calibration (? not as familiar and unclear how useful for prediction) - Bayesian Workflow Section 4.2 - https://arxiv.org/pdf/2011.01808.pdf
1. Deciding on the allowable complexity of the model based on the effective sample size available
1. Allowing for nonlinear predictor effects using regression splines
1. Incorporating pre-specified interactions
- Note that it is better not to think of interactions as in or out of the model but rather to put priors on the interaction effects and have them in the model.
1. Evaluate model diagnostics and address computational problems
- Visualization in Bayesian Workflow Section 4 - https://arxiv.org/pdf/1709.01449.pdf
- Bayesian Workflow Section 5 - https://arxiv.org/pdf/2011.01808.pdf
- MZ: For this point, I was thinking Rhats, ESS, trace plots, divergent transition issues, etc, to make sure that sampling went according to plan and the estimates are reliable. I moved the comments about decision curve analysis to point 12 because that addresses performance assessment and using the model to make decisions.
1. Checking distributional assumptions (Bayesian additions)
- In addition to residual analysis, etc:
  - Posterior predictive checking of distributional parameters orthogonal to the estimates (e.g. skewness when estimating mean in a Gaussian model) - Visualization in Bayesian Workflow Section 5 - https://arxiv.org/pdf/1709.01449.pdf
  - Posterior predictive checking of observed data - Bayesian Workflow Section 6.1 - https://arxiv.org/pdf/2011.01808.pdf
  - Check k-hats using PSIS-LOO - Visualization in Bayesian Workflow Section 6 - https://arxiv.org/pdf/1709.01449.pdf, Bayesian Workflow section 6.2 - https://arxiv.org/pdf/2011.01808.pdf
  - Example in rmsb package: bbr rmsb models
  - Instead of checking distributional assumptions and possibly getting posterior distributions that are too narrow from false confidence in the “chosen” model, allow parameters that generalize distribution assumptions, with suitable priors that make more assumptions for small N. For example, residuals can be modeled with a t distribution with a prior on the degrees of freedom that favors normality but allows for arbitrarily heavy tails as N \uparrow.
1. Adjusting the posterior distribution for imputation
- If multiple imputation was done (as opposed to full Bayesian modeling), posterior stacking makes the posterior distributions appropriately wider to account for uncertainties surrounding imputation.
- Stef van Buuren (2018) Flexible Imputation of Missing Data. Chapman Hall/CRC Press https://stefvanbuuren.name/fimd/stefvanbuuren.name/fimd/
- A Bayesian Perspective on Missing Data Imputation - Yi's Knowledge Base
- Missing Data, Data Imputation | missing-data
- Donald B. Rubin (1996) Multiple Imputation after 18+ Years, Journal of the American Statistical Association, 91:434, 473-489, DOI: 10.1080/01621459.1996.10476908
  https://www.tandfonline.com/doi/abs/10.1080/01621459.1996.10476908
- Question for @f2harrell: where would imputation best fit in a Bayesian workflow? Would it be fair to call the method described in Yi Zhou’s post a form of model averaging?
1. Graphically interpreting the model using partial effect plots and nomograms
- These displays were designed for using point estimates for predictions, and new ideas are needed for how to think of these instead in terms of posterior distributions of predictions.
  *
1. Quantifying the clinical utility (discrimination ability) of the model
- Andrew Vicker’s papers on Decision Curve Analysis:
- Decision curve analysis: a novel method for evaluating prediction models - PMC
- Decision Curve Analysis
1. Internally validating the model using PSIS-LOO (??? ~~assess calibration and discrimination of the model using the bootstrap to estimate the model’s likely performance on a new sample of patients from the same patient stream~~ ???)
- PSIS-LOO seems to be the preferred cross-validation method but I see it primarily used for model comparisons and to assist with checking distributional assumptions as above.
  - see Aki Vehtari’s case studies and FAQ about Cross validation - Cross-validation FAQ
  - Bayesian Workflow Section 5 - https://arxiv.org/pdf/1507.04544.pdf
- Such uses of LOO do not clearly yield discrimination and calibration metrics nor does it clearly assess over-optimism as in the bootstrap. Is there a role for bootstrap? Other sorts of metrics for over-optimism or performance?
1. Possibly do external validation (?)
- Another area of ignorance. Are there uniquely Bayesian concerns here?
1. Prospective prediction
- Taking discrete event risk prediction as an example, try to avoid using point estimates in making predictions, e.g., using posterior mean/median/mode regression coefficients to get point estimates of risk
- Instead, save the posterior parameter draws and make a prediction from each draw, show the posterior distribution of risk, and possibly summarize it with a posterior mean

My open questions:

What’s missing and what needs to be taken away or modified?
Is there a good way to do prior predictive checking with rmsb? Seems like you can’t sample from the prior distribution in the model like you can with brms.
How important is simulation and/or simulation-based calibration in point 4?
What is the best way to justify the sample size for Bayesian models in point 5? Does the rule of thumb p = m/15 still apply?
For point 9, how does one do posterior predictive checking safely? That seems like a risk for researcher-induced overfitting.
- Suggested approach in Point 9 - Instead of checking distributional assumptions and possibly getting posterior distributions that are too narrow from false confidence in the “chosen” model, allow parameters that generalize distribution assumptions, with suitable priors that make more assumptions for small N. For example, residuals can be modeled with a t distribution with a prior on the degrees of freedom that favors normality but allows for arbitrarily heavy tails as N \uparrow.
Questions in points 13, and 14 above.

R_cubed · January 28, 2023, 3:03pm

I think a willingness to grapple with mathematical tools will pay large dividends as it will help in reading the foundational papers in Bayesian Decision Theory as well as information theory.

Quoting from Bayesian Analysis (Bernardo and Smith, 2004, p.67)

Blockquote
We have shown that the simple decision problem structure introduced … suffice for the analysis of more complex, sequential problems which appear, at first sight, to go beyond that simple structure. In particular, we have seen the important problem of experimental design can be analysed in the sequential decision problem framework. [my emphasis] We shall now use this framework to analyse the very special problem of statistical inference, [italics in original] thus establishing the fundamental relevance of these foundational arguments to statistical theory and practice.

I don’t think frequentist methods can truly be understood without understanding Bayesian Decision Theory. I deeply appreciate this perspective by Herman Chernoff in his comment on Bradley Efron’s 1985 paper “Why isn’t everyone Bayesian?”

Blockquote
With the help of theory, I have developed insights and intuitions that prevent me from giving weight to data dredging and other forms of statistical heresy. This feeling of freedom and ease does not exist until I have a decision theoretic, Bayesian view of the problem … I am a Bayesian decision theorist in spite of my use of Fisherian tools.

When you conceptualize your experiment or study with the goal of maximizing information (or designing a communication channel with the highest signal/noise ratio), things become clearer. Much of the advice in RMS can be understood from this point of view.

RE: Clinical utility of prediction models: search data methods for “decision curves” for the most rigorous evaluation of predictive models.

RE: Missing Data and Imputation. Stef van Buuren has an online text on this topic. I think I found it in Frank’s notes or bibliographies somewhere.

Flexible Imputation of Missing Data (2018). Chapman Hall/CRC Press https://stefvanbuuren.name/fimd/

See also:

When thinking about workflows, Jeroen Janssens has published a freely available text on how to use traditional Unix/Linux command line tools as well as R. There is also a discussion of CLI machine learning tools.

The framework discussed has the acronym OSEMN (pronounced “awesome”):

Obtain: (Study design and prospective data collection goes here).
Scrub: (much of the work of “data wrangling” - ie. getting various data sources in a usable format)
Explore: (looking at distributions, missing data, etc. Imputation could be done here).
Model: (Computing likelihoods, posteriors, robustness checks, decision curves etc.)
iNterpret: Draw conclusions and recommendations for practice and future research.

Building upon Shannon/Weaver theory of communication, I’d place any SAP (statistical analysis plan) on the encoding and decoding ends of the channel. At that point, we cannot increase any received information, but it is easy to lose it.

Related Threads

f2harrell · January 29, 2023, 2:49pm

@Mzobeck This is an absolutely wonderful start to a workflow/template for Bayesian clinical prediction model development and validation. I can’t thank you enough for making such a strong start to this process. I have turned your post into a wiki so that others can edit it directly and we can all work together to make it more complete.

@R_cubed your comments are excellent. For the concrete pieces that are especially appropriate for the workflow I would appreciate it if you and @Mzobeck could edit them into the workflow in the first post.

I just added a small section 12. on getting predictions that just points out that point estimates are not necessarily the way to go. I also slightly edited sections on imputation, interactions, adjusting covariance matrix for imputation, and partial effect plots/nomogram. I added a piece about adding parameters instead of assessing goodness of fit.

This is so good to see!

arthur_albuquerque · February 2, 2023, 1:55pm

Great list.

I think I’ve never seen a Bayesian medical article discussing prior predictive checks…

giuliano-cruz · February 16, 2023, 7:34am

Great list!

I have two potential points to add to the discussion:

1) Bayesian model recalibration (model updating): in risk prediction, a middle ground between simple recalibration and complete model revision would be to use 2-component mixture priors for model updating, much like what’s been suggested for modeling pediatric trials: one component of the mixture prior is the posterior of the developed model itself and the other component is a potentially vague or skeptical prior. The mixture proportion \rho defines how much “forgetting” you will allow. If \rho =1 , the current posterior is completely ignored and you have complete model revision. If \rho=0, model updating only happens in case of strong disagreement between your current posterior and the validation data. In the pediatric trial analogy, the mixture proportion represents how much dependence you allow on adult data as in the slide below from this case study:

2) Bayesian Decision Curve Analysis: I am currently working on bayesDCA, an R package to do Bayesian DCA. It simplifies the model from Wynants et al. (2018) for single-setting case (i.e., not meta-analysis), uses conjugate priors for speed, and hopefully provides an easy-to-use interface. It allows you to calculate things like (i) arbitrary functions of net benefit from multiple decision strategies; (i) probability that a model is useful or the best; (iii) expected value of perfect information (i.e., the consequence of uncertainty in net benefit units). It is a work in progress so I’d love to hear your thoughts about the general idea (manuscript in preparation). I am aware of the friction between uncertainty quantification and expected utility maximization and am operating under the assumption that error bars don’t hurt but point estimates might (at least as long as NHST is not involved). Example output:

Thanks!

Mzobeck · February 28, 2023, 5:22pm

Both of these points are very interesting, @giuliano-cruz. I’ve been trying to integrate DCA into my own work. I look forward to seeing the package develop!

giuliano-cruz · February 28, 2023, 8:54pm

Thank you @Mzobeck! I will post on Twitter once the bayesDCA preprint/paper is published. To be fair, the model updating approach is similar to dynamic bayesian updating (e.g., here), but the pediatric trial approach with mixture priors seems way more intuitive to me.

giuliano-cruz · December 2, 2024, 6:55pm

Update: BayesDCA is now published in Stats in Medicine: https://onlinelibrary.wiley.com/doi/10.1002/sim.10277

Allows binary and time-to-event outcomes.

Lots of discussion around the role of uncertainty quantification in DCA, with some decision theory in the supplement.

R package: https://giulianonetto.github.io/bayesdca/