You have made this point on a number of occasions for good reason! I’ve had to give it considerable thought over the past few months.

I have a copy of Westfall and Young’s book Resampling-Based Multiple Testing, where they make the following statement:

Blockquote

Many commonly used frequency based multiple testing protocols are in fact based loosely on Bayesian considerations. (p. 22)

I’ve always wondered how a capable advocates of frequentist methods (@sander @Stephen or Bradley Efron as examples) would reply to your challenge, as they do not seem quite so bothered about the issue.

I presume they would say something like:

Blockquote

Yes, it is true that the Frequentist-Bayesian mapping is most easily demonstrated with a single data look, and attempts to extend this to multiple looks requires care. But far from refuting frequentist procedures, the Bayesian POV merely requires that the experimenter who adopts frequentist methods adjusts his p-values in order to maintain the Frequentist-Bayes interpretation. There exist situations where it is better to pay for the specification of the problem over time via p value adjustment, vs. spending a lot of time modelling prior to data collection.

In this post, Stephen Senn goes into the early history of how Fisher advocated for “significance tests” and p-values *that could be interpreted in a Bayesian way* without being as sensitive to the prior.

Blockquote

Now the interesting thing about all this is if you choose between (1) H_0: \tau = 0 \lor H_1: \tau \neq 0 on the one hand and (2) H_0: \tau \leq 0 \lor H_1: \tau \gt 0 or (3) H_0: \tau \geq 0 \lor H_1: \tau \lt 0 on the other, it makes remarkably little difference to the inference you make in a frequentist framework. You can see this as either a strength or a weakness and is largely to do with the fact that the P-value is calculated under the null hypothesis and that in (2) and (3) the most extreme value, which is used for the calculation, is the same as that in (1). However if you try and express the situations covered by (1) on the one hand and (2) and (3) on the other, it terms of prior distributions and proceed to a Bayesian analysis, then it can make a radical difference, basically because all the other values in H_0 in (2) and (3) have even less support than the value of H_0 in (1). This is the origin of the problem: there is a strong difference in results according to the Bayesian formulation. It is rather disingenuous to represent it as a problem with P-values per se.