Bayesian Biostatistical Modeling Plan

I think a willingness to grapple with mathematical tools will pay large dividends as it will help in reading the foundational papers in Bayesian Decision Theory as well as information theory.

Quoting from Bayesian Analysis (Bernardo and Smith, 2004, p.67)

Blockquote
We have shown that the simple decision problem structure introduced … suffice for the analysis of more complex, sequential problems which appear, at first sight, to go beyond that simple structure. In particular, we have seen the important problem of experimental design can be analysed in the sequential decision problem framework. [my emphasis] We shall now use this framework to analyse the very special problem of statistical inference, [italics in original] thus establishing the fundamental relevance of these foundational arguments to statistical theory and practice.

I don’t think frequentist methods can truly be understood without understanding Bayesian Decision Theory. I deeply appreciate this perspective by Herman Chernoff in his comment on Bradley Efron’s 1985 paper “Why isn’t everyone Bayesian?”

Blockquote
With the help of theory, I have developed insights and intuitions that prevent me from giving weight to data dredging and other forms of statistical heresy. This feeling of freedom and ease does not exist until I have a decision theoretic, Bayesian view of the problem … I am a Bayesian decision theorist in spite of my use of Fisherian tools.

When you conceptualize your experiment or study with the goal of maximizing information (or designing a communication channel with the highest signal/noise ratio), things become clearer. Much of the advice in RMS can be understood from this point of view.

RE: Clinical utility of prediction models: search data methods for “decision curves” for the most rigorous evaluation of predictive models.

:new: RE: Missing Data and Imputation. Stef van Buuren has an online text on this topic. I think I found it in Frank’s notes or bibliographies somewhere.

Flexible Imputation of Missing Data (2018). Chapman Hall/CRC Press https://stefvanbuuren.name/fimd/

See also:

:new: When thinking about workflows, Jeroen Janssens has published a freely available text on how to use traditional Unix/Linux command line tools as well as R. There is also a discussion of CLI machine learning tools.

The framework discussed has the acronym OSEMN (pronounced “awesome”):

  1. Obtain: (Study design and prospective data collection goes here).
  2. Scrub: (much of the work of “data wrangling” - ie. getting various data sources in a usable format)
  3. Explore: (looking at distributions, missing data, etc. Imputation could be done here).
  4. Model: (Computing likelihoods, posteriors, robustness checks, decision curves etc.)
  5. iNterpret: Draw conclusions and recommendations for practice and future research.

Building upon Shannon/Weaver theory of communication, I’d place any SAP (statistical analysis plan) on the encoding and decoding ends of the channel. At that point, we cannot increase any received information, but it is easy to lose it.

Related Threads

1 Like