No, not a neurosurgery journal but one that is open to new ideas (JAMA was not)
To extend the analogy further: When I started surgical training 40 years ago, a surgeon did all the tasks necessary to provide surgical care. Then came PA’s and NP’s. Now, the neurosurgeon may do the operation, but almost all the other perioperative care is provided by individuals with substantially less training.
Statistics have also evolved. When I was a research fellow at UCLA, we had unfettered access to very senior statisticians. One simply went to their offices and got help. In those days access to statistical knowledge and computer programs was very limited.
Now, if I want to talk to a statistician, I need to pay a hefty consultation fee. However, very sophisticated statistical software is freely available, and AI can answer almost any question one has about statistical analysis. Given that we do not have infinite resources to pay for statistical consultation, we rely on what we can get done on our own.
An example: I asked Grok to design a methods section and power analysis for an adaptive trial. I am posting it below. It is remarkably similar to a similar study design I paid a consultant about $1,000 a few years ago to write.
Just as the role of neurosurgeons has changed, so has it for statistical consultants. Authors will continue to do their own analyses and neither they nor the journals have the resources to pay for statistical consultation. Given this reality, we need to find the optimal balance between having expert input and having a process to minimize errors when that input is not available.
Grok: “Write a sample method section describing the study power calculation for an adaptive trial design Using bayes methodologies”
Methods
Study Design
This phase II clinical trial utilized a Bayesian adaptive design to evaluate the efficacy of a novel therapeutic agent (Intervention A) compared to standard care (Control B) in patients with moderate-to-severe inflammatory disease. The trial was a randomized, double-blind, parallel-group study with a 1:1 allocation ratio. The primary endpoint was the proportion of patients achieving clinical remission at 24 weeks, defined as a binary outcome (remission vs. no remission).
The adaptive design incorporated interim analyses at prespecified enrollment milestones (after 40%, 60%, and 80% of the initial target sample size) to allow for early stopping for efficacy, futility, or sample size re-estimation. Decisions were based on Bayesian posterior probabilities, enabling data-driven modifications while maintaining trial integrity. The initial target sample size was 200 patients (100 per arm), with a maximum allowable increase to 300 patients if interim results were inconclusive.
Bayesian Framework
A Bayesian hierarchical model was employed for inference on the primary endpoint. Let pA and pB denote the true remission rates for Intervention A and Control B, respectively. The number of remissions in each arm followed a binomial distribution: YA ∼ Binomial(nA, pA) and YB ∼ Binomial(nB, pB), where nA and nB are the sample sizes per arm at each analysis stage.
Non-informative prior distributions were assigned to the remission rates: pA ∼ Beta(1, 1) and pB ∼ Beta(1, 1), equivalent to a uniform prior on [0,1]. This choice reflected limited prior knowledge and allowed the data to dominate posterior inference. At each interim analysis, posterior distributions were updated using Markov Chain Monte Carlo (MCMC) methods implemented in R (version 4.2.1) with the rjags package, drawing 10,000 samples after a 5,000-iteration burn-in to ensure convergence (assessed via Gelman-Rubin diagnostic <1.1).
Adaptive rules were as follows:
- Efficacy stopping: Stop early if the posterior probability P(pA > pB + δ | data) > 0.975, where δ = 0.10 (minimum clinically important difference).
- Futility stopping: Stop early if P(pA > pB + δ | data) < 0.05.
- Sample size re-estimation: If 0.05 ≤ P(pA > pB + δ | data) ≤ 0.975, increase enrollment by up to 50 patients per arm, based on predictive simulations of trial success.
Final success at trial completion was declared if P(pA > pB + δ | data) > 0.95.
Power Calculation (Simulation-Based Assurance)
Given the adaptive nature and Bayesian decision rules, traditional frequentist power calculations were not directly applicable. Instead, we computed the trial’s operating characteristics, including Bayesian assurance (the predictive probability of trial success averaged over plausible parameter values) and frequentist-like metrics (e.g., empirical power and type I error rate), via Monte Carlo simulations.
Simulations were conducted using R software. We generated 5,000 replicated trials under a range of scenarios for the true remission rates:
- Null scenario: pA = pB = 0.30 (no treatment effect).
- Alternative scenarios: pA = 0.45, pB = 0.30 (moderate effect); pA = 0.50, pB = 0.30 (strong effect); and boundary cases (e.g., pA = 0.35, pB = 0.30).
- Priors were incorporated by first sampling true rates from the prior distributions (for assurance) or fixing them at scenario values (for conditional power).
For each replicated trial:
- Patient data were simulated sequentially according to the true rates and binomial model.
- Interim analyses were performed at the specified milestones, updating posteriors and applying adaptive rules (e.g., stopping or re-estimating sample size).
- If the trial continued to completion, the final posterior was evaluated against the success criterion.
The Bayesian assurance was calculated as the proportion of simulations resulting in a successful outcome (declaring superiority of Intervention A), averaged over the prior distribution of effect sizes. Conditional power was the success proportion given fixed true rates under the alternative. Type I error was the false-positive rate under the null.
Results indicated:
- Assurance of approximately 82% for detecting a moderate effect (δ = 0.15), assuming priors.
- Conditional power of 85% under the primary alternative scenario (pA = 0.45, pB = 0.30).
- Type I error controlled at <2.5% under the null.
- Average sample size across simulations: 185 patients (range: 120–280), demonstrating efficiency gains from adaptations.