Frequentist vs Bayesian debate applied to real-world data analysis: any philosophical insights?

Blockquote
This means that if any prior evidence in favour of a harmful effect of smoking on Covid-19 severity was available with 95% CI between 1-11.1, the association would be credible because the posterior OR 95% credible interval would be >1.

That is also how I understand the technique. The interval is very wide, so a “non-significant” result shouldn’t be used as justification to stop further research that might narrow down the precision of the estimate.

Blockquote
In this case, I find it harder to understand credibility with a non-significant result.

Context will need to be used to provide a guide. If you had a high prior odds that there was some effect, you might do another study and control for factors you hadn’t initially considered. If this was exploratory, and you had low prior odds in the validity of the effect, you might end your research process here.

(I’d caution taking the point estimate of a retrospective meta-analysis at face value, due to heterogeneity.)

This goes back to the issue of how frequentist surprise measures relate to Bayesian priors.

I think this is how Chernoff interpreted Fisher’s methods in a Bayesian way. Rather than set an entire probability distribution, pick a point of reference as a default and calculate a statistic that isn’t as sensitive to prior assumptions. But design an experiment using N and \alpha so that it gives you the posterior odds necessary for the question.

IIRC Fisher advised using the smallest sample that would answer the question at hand. This implicitly combines NP power with the value of information.

Note the distinction between “plausible” and “dividing” hypotheses. At least in treatment contexts, we are looking at a “dividing” hypothesis. In other areas (ie. genomics) there is a strong prior in a very small interval null, that is approximated by using a point null as a probability model as a plausible default.

As for “intrinsically credible” significant results, it seems statisticians are re-discovering ideas that the early pioneer in manufacturing statistical process control (SPC) Walter Shewhart, developed. It is unfortunate his ideas were not mentioned in the debates about \alpha levels a few years ago.

Blockquote
Thus, when the process is being operated predictably, Shewhart’s generic three-sigma limits will result in values of P that are in the conservative zone of better than 97.5-percent coverage in virtually every case. When you find points outside these generic three-sigma limits, you can be sure that either a rare event has occurred or else the process is not being operated predictably. (It is exceedingly important to note that this is more conservative than the traditional 5-percent risk of a false alarm that is used in virtually every other statistical procedure.

In the article, he compared a how a very low p-value threshold would detect differences among large families of probability distributions. The 3 sigma limit was a useful rule, and has been used for close to 100 years now.

Here is a more formal presentation where a \pm 3-sigma limit is used to filter out random noise relative to many families of probability distributions.

I’ve always thought the SPC methods were a very reasonable use of basic statistics. Without any real assumptions, very low p values / high sigmas (ie. mean \pm 3SD ) will alert to departures from a predictable, controlled process over 97% of the time.

The use of p-values in SPC is in agreement with Sander’s observation that p-values only alert you to a violation of 1 or more assumptions of the model used to calculate it. That is why I find the term “surprisal” so useful.

In SPC, the sigma threshold (calculated from observational historical data) is used as a guide on when to invest resources in research to improve the process. Perhaps we should think of other types of research that way.

1 Like