new paper in SBR, maybe you saw it: Bayesian Methods in Regulatory Science
Bayesian early phase designs often use information more efficiently and thus facilitate better decision making during the trial. Their outputs are more intuitive to healthcare providers and patients, and their operating characteristics are often superior to standard designs.
It is important here to note that running most Bayesian trials correctly will require bringing data entry and trial monitoring software to the 21st century. This is a challenge for both academic centers and industry but addressing it will ultimately benefit everyone.
As a doctor, one of the questions that most intrigues me right now is the distinction between the determination of the frequency sample size and the Bayesian counterpart. What would this be like? Could the sample size be adjusted for a certain precision?
With Bayes you don’t really need a sample size, but budgetary folks will demand one anyway. The precision analog for Bayes is to compute N such that the width of a 0.95 credible interval will equal a specified value. If instead you want a power-like Bayesian sample size it could be N such that the probability that the posterior probability exceeds a certain high value is high. You’d need to either pick a single effect value or do the calculation over a distribution of non-null effects (a prior that does not include the null).
I don’t quite understand the message/point you try to make out of the simulation of calibration (“continuous learning”). To be specific:
a. Parameter prior and your analysis prior are identical, which makes the calibration claim (i.e. posterior inference=truth) mathematically valid. I understand true parameter could possibly follow some true distribution. However, in real practice, it is impossible we could assign an analysis prior that is the same as the true distribution of parameter. When analysis prior and true parameter distribution are not the same, we no longer have this calibration. Then why we need this simulation?
b. How can we appreciate the Bayesian continuous learning under frequentist simulation framework (i.e. evaluation of operating characteristics)? You mentioned “Multiplicity comes from the chances (over study repetitions and data looks) you give data to be more extreme (if the null hypothesis holds), not from the chances you give an effect to be real.”, it seems if we do frequentist simulation evaluation, we always give data chance to be extreme.
How do you think of operating characteristic based simulation (e,g. do Bayesian analysis/decision and evaluate frequentist properties)? You mentioned lots of cons of frequentist thinking (e.g. evaluating operating characteristics), is there any pro?
If we don’t run the simulation with respect to frequentist properties, could we perform other types of simulation for Bayesian analysis? If yes, what will be the purpose of those simulation?
Thank you very much for the reply!
I’ve worked with many non-statisticians who are unswayed by math. They tend to be swayed by simulation. Perhaps a more interesting simulation would be to have a discordance between the parameter simulation prior and the analysis prior. The results will be predictable, e.g., a simulation prior that is more conservative than the analysis prior will result in more extreme posterior probabilities. But there is perhaps a more important reason for doing the simulations. They demonstrate with simple R code the vast difference in thinking between Bayesian and frequentist paradigms. In frequentism it is considered important to compute type I “error” probability, done by assuming zero effect and no harm of treatment (\theta=0). Then you see how often you would generate evidence for a non-zero effect, i.e., evidence for what \theta is not. The job of Bayesian posterior inference on the other hand is illustrated by generating clinical trials for all possible values of \theta as tilted by the prior, and trying to recover \theta no matter what \theta is. The main job of Bayes is to show that our evidence about \theta is calibrated. As a side note, if \theta is exactly zero, you will logically sometimes conclude it is not exactly zero (get a type I “error” probability > 0) whether Bayesian or frequentist. But it is important to note that the probability this happens is not the probability of making a mistake in concluding that a treatment works.
That’s the point. Think of it this way: in frequentism we compute probabilities about data and in Bayes we compute probabilities about \theta. Even when data evolve, we’re still thinking about the unmoving target \theta with Bayes. Frequentism involves calculations about the evolution of data. Bayes computes probabilities about \theta given the current most complete up-to-date data. Posterior probabilities that you computed a few days before this have been superceded and are obsolete, not affecting how you interpret current cumulative evidence.
To be interested in frequentist properties of Bayesian procedures is to not understand evidence for assertions being true but instead converting the question to that of frequency of assertions being made given a single magical value of \theta. So no thanks. But when you do compute frequentist properties of Bayesian procedures the result is pretty darn good. In many cases Bayesian credible intervals are more accurate than frequentist confidence intervals in terms of frequentist coverage probability, because frequentist methods use a number of approximations when the log likelihood is not quadratic.
All permutations are possible. Bayesian power is the most interesting: what is the probability that the posterior probability will ever reach 0.95 as N gets larger, for example.
Thanks for the super questions.