The reason that Bayesian inference is more efficient for continuous learning is that it computes probabilities looking ahead—the type of forward-in-time probabilities that are needed for decision making. Such probabilities are probabilities about unknowns based on conditioning on all the current data, without conditioning on unknowns. Importantly, there are no multiplicities to control. This is one of the least well understood aspects of Bayesian vs. frequentist analysis, and it is due to current probabilities superseding probabilities that were computed earlier.
…
Traditional statistics has multiplicity issues arising from giving more than one chance for data to be extreme (by taking more than one look at the data). It is the need for sampling multiplicity adjustments that makes traditional methods conservative from the standpoint of the decision maker, thus making continuous learning difficult and requiring larger sample sizes. The traditional approach limits the number of data looks and has a higher chance of waiting longer than needed to declare evidence sufficient. It also is likely to declare futility too late.
It seems to me that one motivation for the longstanding emphasis on one (or a very few) ‘primary outcome(s)’ is to avoid problems of p-hacking, garden-of-forking-paths, etc. These worries seem quite similar to concerns about multiplicity. Does Bayesian analysis allow for infinitely many outcomes, just as it allows infinitely many looks at the data?
Great question David. I gave a partial answer here. The Bayesian philosophy could not be more different than frequentism in this arena. It’s philosophy is that you assess current evidence about any assertion as a starting point, carry that through the analysis without modification, as a pre-study prior, then you interpret the results using that original prior. The prior for one endpoint can be arbitrarily different from the prior for another clinical endpoint. And evidence about outcome 1 does not need to be modified by whatever evidence you have for outcome 2. You can see from this that the designation of “primary”, “secondary”, “co-primary”, … etc. outcomes is not needed nor does it help. The only thing that I’d recommend in this setting is to prioritize the reporting order of outcomes to give the reader comfort that cherry-picking has not happened.
Bayes is flexible but you need to pre-specify how it will be done.
Thanks, Frank. That ‘partial answer’ is truly helpful. The Example 2 section at the bottom, especially, now has me thinking that the frequentist-Bayesian schism may not actually be implicated after all in the matter of multiple outcomes. Both treatments of the 2nd outcome in your Example 2 seem parallel.
there are many reasons to pre-specify, it focuses your attention; eg you cannot triple check every datum, but maybe you can if you limit yourself to a primary analysis; maybe it focuses data collection, monitoring, communication of results; the power calculation is mostly to fix some parameters in place for a budget, not for scientific standards. And if you’ve worked with marketing you’ll see how important it is to reign in the imagination…
Great point! In my view the “primary outcome” is problematic. In many situations (or most?) there isn’t ONE outcome that is of overriding importance - there are usually a few that are really important, and some more that are less important (and “important” to whom? different people will have different views). Designating one primary outcome encourages an interpretation of the trial based just on that - hence trials are described as “positive” or “negative” (also terms we should bin) based on the statistical significance of one outcome. Obviously that doesn’t really make a lot of sense. One of the big advantages of a Bayesian approach is that it facilitates a more sensible way of interpreting a trial based on all of the outcomes that are important (as Frank and others have argued).
im more worried about a designated primary analysis than a primary outcome, because we often go for a ‘safe’ analysis as primary (as simple as possible eg at a single time point, a familiar test etc) and the more sophisticated analyses end up with secondary status, which may include a bayesian analysis
Just to be clear: from the “likelihood-principle” Bayesian perspective, you just don’t care about the things the frequentist is worried about. Such a thing as type I error rate control does not matter to you, if you take that perspective. I.e. just doing a Bayesian analysis (even more silly when you get the same estimate & credible itnerval as the maximum likelihood estimate and confidence interval due to having a lot of data and non-informative priors*) does not avoid a type I error inflation due to multiple looks or multiple endpoints - it most definitely inflates the type I error, you just don’t care.
The other thing to notice (see * above) is that a sensible Bayesian prior would not be non-informative or treat the priors for multiple endpoints as independent, but rather reflect that a treatment that works would be more likely to work on multiple endpoints and a treatment that does not work is more likely to just show some random blips in some random endpoint, but not consistent result across endpoints. You can achieve that e.g. with some sensible hierachical model. Additionally, you’d probably want to reflect that very large treatment effects are rare, so e.g. something like a N(0, 1) prior on log-hazard, log-risk or log-odds ratios may be called for. Indepent non-informative priors on separate endpoints should make any Bayesian shudder with disgust.
We can of course argue whether a frequestist perspective (“Lots of people want to sell their treatments/advertise their scientific discovery, we want a procedure that controls the rate of false claims of effective therapies in the long-run.”) vs. a Bayesian perspective with proper meaningful priors makes more sense (actually, often with truly meaningful priors the frequentist operating characteristics of a Bayesian approach will actually be quite reasonable), but my impression is that too often people want to have their cake and eat it. I.e. do the frequentist analysis (non-informative independent priors) and claim to be able to look as much as they like. That’s a recipe for their torturing some kind of confession out of any data à la Warsink’s pizza-gate.
This is a common misperception of what the type I assertion probability represents. It is not that. It is instead the probability of asserting an effect when there is no effect. Since the type I probability is not the probability of making a mistake, it was never relevant even to a frequentist. P(false claim) = P(drug doesn’t work). Bayesian P(effect | data) means the same thing where you assert an effect or not.
A super point, but a hierarchical model will seldom do that (e.g. one endpoint may be binary and the other continuous) and does not give you marginal interpretations of treatment effect on any one endpoint. Copulas solve both of those problems, but they are complicated.
Thank you @Bjoern and @f2harrell for this very rich further discussion. It’s taken me multiple reads over a few days to fully take it in.
If I understand Bjoern right, what liberates us here is the combination of Bayesian techniques (enabling use of substantive prior information) with Bayesian attitudes (in particular, likelihoodism), and that the latter may be more important than the former.
One less abstract technique than copulas is latent-variable models—and perhaps Bjoern intended his ‘hierarchical models’ to include these? One reason I favor latent-variable models is that they afford the opportunity to represent substantive, scientific theories (about latent constructs) directly within the model, and then to ‘confront’ our theorizing with data. Under this approach, the thing some Bayesians call ‘prior-data conflict’ (and regard as an ‘issue’ to be worked around), a falsificationist regards as a triumph—we learned something! (Learning you’re wrong is an ‘issue’ to an academic, but an opportunity for a scientist.)
Interestingly, I think this falsificationist outlook, although philosophically quite different from the likelihoodist attitude of the Bayesian as invoked by Bjoern, nevertheless has a similar effect. The falsificationist who is advancing a bold conjecture wants to make multiple predictions, such that any one of these creates a learning opportunity if it fails to pan out. The more predictions that can be made, the bolder (i.e., more substantive) we would judge the theory to be.