Statistical Analysis Plans

f2harrell · October 31, 2025, 4:05pm

The purpose of this topic is to provide resources for writing prospective statistical analysis plans (SAPs) and to provide a place to collect suggested resources from a wider community.

Primary Resources

A template for the authoring of statistical analysis plans by Gary Stevens, Shawn Dolley, Robin Mogg, Jason Connor, 2023.
Guidelines for the content of statistical analysis plans in clinical trials by Carrol Gamble, Ashma Krishan, Deborah Stocken et al, 2017, with supplemental material
Statistical errors to avoid

Specific Templates and Discussions

Biostatistical modeling plan by Frank Harrell, original version 2010
Bayesian biostatistical modeling plan
Interim analysis plan for observational studies
Guidelines for covariate adjustment in RCTs
Power and sample size calculations in pilot studies

Dealing with Complexities

Discussions here, here and here

General Ideas

SAPs are mandatory in RCTs; science would improve by also thinking of them as mandatory for observational studies
SAPs should contain reasons for possible deviation from the signed and dated SAP, e.g., a variable was impossible to collect reliably
Having an SAP is the best way to help investigators avoid the temptation to change the question after seeing disappointing results; the SAP protects the statistician from being the “bad guy”
A change in the SAP for a reason that was not envisioned in the SAP raises the most red flags

robinblythe · November 2, 2025, 2:30am

My colleague Adrian Barnett has proposed scrambling your data ahead of time to reduce the effect of investigator biases: https://rss.onlinelibrary.wiley.com/doi/full/10.1111/1740-9713.01473

James_Stanley · November 2, 2025, 10:10pm

Thanks for the links to the SAPs Frank, and to this article Robin.

I’ve done this allocation scrambling for a few RCTs now (I hadn’t seen Adrian’s piece in Signficance before, it’s great to see a semi-formal framing of the approach) to complete analysis coding before getting the allocation key, including to discuss arising data-related issues, but not quite up to the point of initial result presentation to the wider team.

I’d agree that in the era of Markdown-based reporting it’s not too complex to deploy this mock-allocation approach. I’ve had some model convergence-like issues that (dis)appeared when running on the true allocations, but nothing major, and at other times it’s been handy to know that the planned analysis is incompatible with the available data (typically too complicated for the available data).