Hi Kert! I’m really glad you have joined datamethods.
I think there are two classes of Bayesian operating characteristics, and I think you have combined them in your description. The first is the correctness of decisions, for any direction (efficacy, harm, etc.) of decision. That is P(decision is correct | data, prior). The second is the Bayesian power, i.e., sensitivity of the Bayesian design to detect a relevant treatment effect, i.e., P(P(effect > epsilon | data, prior’) > 0.95) where prior’ is the subspace of the sampling (simulation) prior that corresponds to benefits not to miss. This will typically be the uncertainty distribution for the MCID. Dealing with Bayesian power is what prevents your N=1 example from happening.
My interpretation of your lottery ticket example is that you are emphasizing the role of tendencies when choosing a design. This is an excellent point because before we have any data collected, and when we don’t have excellent data from a previous experiment, we must rely on tendencies. This is neither a Bayesian or a frequentist idea, but the particular tendencies that you model will differ by paradigm. In your discussion about \alpha and \beta you are entertaining the notion that tendencies can be based on frequentist ideas, which rely heavily on unobservables. This is the most common approach but I do not adopt this approach. We can instead consider tendencies of purely Bayesian quantities and avoid unobservables as I describe below.
Perhaps a more compelling example than the lottery example, for just the present purpose, would be along the lines of this NFL problem which I’ll modify for illustration. Suppose that you had two goals: to be correct about your team winning and to not spend too much time watching the game. One person bases his decision on the score at the end of the half. A second person continuously assesses the score and the current probability of ultimate victory, all throughout the first half. Both persons will make accurate decisions. The second person will make the decision more quickly on average. Continuous assessment by person 2 does not modify any relevant operating characteristics except for the expected sample size.
A Bayesian can select an experimental design using strictly Bayesian operating characteristics, and obtain excellent operating characteristics that matter. An example here shows how all goals in a set of goals can be achieved simultaneously, including correctness of decisions, Bayesian power, precision, and low expected sample size.
I need to elaborate on P(decision is correct | data, prior). The actual posterior probability computed in this situation is the probability the treatment doesn’t work, whether or not we conclude that it does. Writing it out fully, what we are trying to prevent is concluding that the treatment effect is more than trivial (with triviality threshold epsilon) when the treatment effect is really < epsilon. The posterior error probability is P(effect < epsilon | data, analysis prior) when epsilon is drawn from the sampling (simulation) prior which may deviate from the analysis prior. Better notation is P(effect < epsilon | data, analysis prior, sampling prior), meaning that it is computed assuming the pre-specified analysis prior, while the universe of treatment effects we are sampling from is specified by the sampling prior. The purpose of this kind of fully pre-specified sensitivity analysis is to show that the study’s conclusions are not too sensitive to the choice of analysis prior when they shouldn’t be.
None of this is affected by multiple looks other than the fact that you need to carefully specify which conditions need to hold with which probabilities as shown in the example linked above. In that example, I put a minimum N requirement before looking for evidence of non-trivial efficacy. This requirement essentially says that “if we stop early with evidence for efficacy we must be able to have a reasonable precision (e.g., width of posterior distribution) of the amount of treatment benefit.” If we stop early for inefficacy, this design jettisons any ability to well-estimate just how poor the treatment performs.
Big picture: Excellent designs are important to Bayesians, and they dictate how often we look at the data, make adaptations, etc. The goal of having an excellent pre-specified design does not change which operating characteristics we need to use.
Bayesians do not profit from sampling distributions except in possibly in one situation: simulating tendencies of specific experimental designs. For that we have to simulate repeated (using 1000 - 10000) experiments and run the Bayesian procedure to compute the probabilities of correct decisions, Bayesian power, expected sample size, etc. How the simulations are done is markedly different from the frequentist approach because of not relying on unobservables during the analysis. Bayes doesn’t ask for the probability of asserting an effect were the effect to magically be zero, i.e., when any assertion of effect is by definition wrong. Instead the sampling prior specifies the universe of effects and each simulated trial has its data created for a different effect size. Then the Bayesian no-unobservables analysis is run and then we finally reveal the effect in play to judge correctness of a decision based on posterior probability.
Instead of testing null hypotheses, the goal of Bayesian posterior inference is to uncover the hidden truth generated the data, no matter what that truth was. An excellent example of the Bayesian “uncover the data generating mechanism as much as the data and prior will allow” philosophy is in the successful application of Bayes in neuroscience. It is easy to see which brain region is associated with a stimulus to the left big toe. To uncover which stimulus created a certain brain activation requires a reversal, with Bayes’ rule to the rescue.