In another post about understanding p-values, ESMD made an important observation:
ESMD wrote:
I suspect hat much of the confusion among physicians learning critical appraisal stems from the fact that our teachers are often (?usually) practising MDs, who might not understand the historical foundations of statistics well enough to address questions like this. As a result, these types of questions get glossed over.
In that thread, @Sander_Greenland mentioned Richard Royall – the biostatistician and author of an important philosophical text Statistical Evidence: A Likelihood Paradigm (published in 1997) where he wrote in the Preface:
Blockquote
…Standard statistical methods regularly lead to the misinterpretation of scientific studies. The errors are usually quantitative, when the evidence is judged to be stronger (or weaker) than it really is. But sometimes they are qualitative – sometimes one hypothesis is judged to be supported over another when the opposite is true. These misinterpretations are not a consequence of scientists misusing statistics. They reflect instead a critical defect in current theories of statistics.
The subtle differences in the Neyman-Pearson and Fisher interpretation of p-values Sander describes, is elaborated on in this article I highly recommend:
Royall’s monograph was published only 1 year after the widely cited Sackett et. al paper on Evidence Based Medicine (EBM):
Blockquote
Evidence based medicine is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.
Later evolution of EBM involved elevating statistical fallacies into so-called “hierarchies of evidence” that are logically incoherent when looked at carefully. Philosophers have had a field day tearing apart the simplistic “hierarchies of evidence”, which is how I ended up here way back in 2019!
I independently discovered Stegenga’s argument from thinking about “evidence” as closely related to the economic problems of Welfare and Social Choice Theory, where Kenneth Arrow’s famous result seemed applicable. Stegenga’s paper preceded my discovery by a few years, but his impossibility proof was very similar to my own thinking.
A more “statistical” critique is found here:
The fundamental problem of EBM hierarchies and checklists is that they take frequentist pre-data design criteria and apply them in a post-data context. This violation of the likelihood principle is true of the Neyman-Pearson perspective, not the Fisherian one that Sander alluded to. Casella describes the problem in this paper:
Goutis, C., & Casella, G. (1995). Frequentist Post-Data Inference. International Statistical Review / Revue Internationale de Statistique, 63(3), 325–344. https://doi.org/10.2307/1403483
Blockquote
The end result of an experiment is an inference, which is typically made after the data have been seen (a post-data inference). Classical frequency theory has evolved around pre-data inferences, those that can be made in the planning stages of an experiment, before data are collected. Such pre-data inferences are often not reasonable as post-data inferences, leaving a frequentist with no inference conditional on the observed data. We review the various methodologies that have been suggested for frequentist post-data inference, and show how recent results have given us a very reasonable methodology. We also discuss how the pre-data/post-data distinction fits in with, and subsumes, the Bayesian/frequentist distinction
These critical defects have so deformed the “peer reviewed” literature (as well as clinical practice guidelines based upon them) that large areas of scholarship are practicing cargo-cult (from Feynman) or “pathological” science (from psychologist and psychometrician Joel Michell).
A snarky Bayesian might describe it as “researchers pretending to know the difference between their priors and their posteriors.”
My quasi-formal analysis of this in the context of parametric vs. ordinal models was posted here:
Related Threads