# Significance tests, p-values, and falsificationism

This thread takes its inspiration from the recent discussions in social science and statistics about significance tests, what they’re good for, whether p-values should be banned, and what all of that has to do with general scientific methodology, particularly the Popperian one called falsificationism. See this Twitter thread for a random place to jump into the discussion. That place, however, was where @f2harrell suggested we create a thread over here.

I’ll start this thread off, if I may, with three clearly delineated but controversial theses on the topic that should provide more than enough material to get a fruitful discussion going. (That should mean that we’re all taking it as an opportunity to learn something. I think.)

1. Signficance tests are useful to science only if you realise what aim of science they are meant to further and capable of furthering. That aim does not include showing that your hypothesis is true; “claiming a discovery”; or “the probability that there is a real effect”.

2. A single p-value doesn’t mean anything. It doesn’t even mean your study is or isn’t worthy of publication.

3. Falsificationism is not just compatible with significance tests (and vice versa), they are based on the same rationales. One of those is the asymmetry between verifiability and falsifiability of certain statements; another is the central aim of science: to learn from experience given that induction doesn’t work and that all our efforts are fallible.

I’ll try and expand on these theses in separate comments—and, of course, in answers to questions.

References that might be useful:
[1] Commentary on a post that claims to show “Why Falsificationism is False” explaining what it misunderstands about falsificationism and how.
[2] A few quotes by RA Fisher to illustrate what he actually thought about the tests that he was the principal proponent of. (Link to follow.)

6 Likes

To start off my replies, this claim:
“A single p-value doesn’t mean anything”
is false if taken literally, and embodies a confusion I find prevalent among P-values critics who fail to distinguish math objects from their various interpretations. A single P-value has several mathematical meanings, some of which can be interesting and even useful to statistical modelers. The text below is a brief loose description which may well contain some errors of haste; the general ideas are reviewed at greater length elsewhere, for example
Greenland S, 2019. Some misleading criticisms of P-values and their resolution with S-values. The American Statistician 73, supplement 1, 106-114, open access at
www.tandfonline.com/doi/pdf/10.1080/00031305.2018.1529625
Greenland S, Rafi Z 2020. Technical issues in the interpretation of S-values and their relation to other information measures. [2008.12991] Technical Issues in the Interpretation of S-values and Their Relation to Other Information Measures

Most basically, shorn of the baggage of “significance” judgments and decisions and focused on the regular regression models that form the overwhelming majority of everyday analyses: A single P-value p is the quantile location of a directional measure of divergence t = t(y;M) of the data point y (usually, the vector in n-space formed by n individual observations) from a test model manifold M in the n-dimensional expectation space defined the logical structure of the data generator (“experiment” or causal structure) that produced the data y. M is the subset of the Y-space into which the conjunction of the model constraints (assumptions) force the data expectation or predict where y would be were there no ‘random’ variability. I also use M to denote the set of all the model constraints, as well as their conjunction.

With this logical set-up, the observed P-value is the quantile p for the observed value t of T = t(Y;M). This p is read off a reference distribution F = F(t;M) for T derived from M. This formulation is essentially that of the “value of P” appearing in Pearson’s seminal 1900 paper on goodness-of-fit tests. Notably, his famed chi-squared statistic is the squared Euclidean distance from y to M, with coordinates expressed in standard-deviation units derived from M.

More broadly, the statistic T can be taken as a measure of divergence of a more general embedding or background model manifold A (which includes all ‘auxiliary’ assumptions) from a more restrictive model M, with the goodness-of-fit case taking A as a saturated model covering the entire observation space, and the more common “hypothesis testing” case taking M as the conjunction of an unsaturated A with a targeted ‘test’ constraint (or set of constraints) H. This H is logically independent of A and consistent with A, with M = H&A in logical terms, or M=H+A in set-theoretic terms with + being union (in particular, we assume no element in H is entailed or contradicted by A and no element in A is entailed or contradicted by H).

Different divergence measures lead to different T and P-values, e.g., contrast likelihood-ratio, score, Wald, and least-squares variants. In all the cases considered here however their definition can be framed as divergence from a data projection A(y) onto A to the same type of data projection M(y) onto M.

A crucial point: The reference distribution F is deduced from M but that does not dictate its interpretation or that of p. F is almost always taken as a repeated-sampling frequency distribution for T, with M and hence F supposedly derived from what the mechanics of the actual physical data generator would be if M were correct. But it could instead be taken as a function providing coherent previsions (bets) for T given M. Recognizing these and other betting interpretations avoid the endless, mostly pointless battle of “frequentists” vs “Bayesians” carried on by those who fail to see the precise mapping between the interpretations created by the underlying math of the quantities. Put another way, the stripped-down formalism of P-values has no knowledge of user interpretation or semantics for the random variables T and P or their observed counterparts t and p.

I recommend P-values be taught and used in the above descriptive mathematical way, albeit most people will need a much more simple, less technical version as covered for example in Rafi Z, Greenland S, 2020. Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20, 244.

2 Likes

With that lengthy technical background out of the way perhaps some will see why I can’t respond properly on Twitter.

Now I can criticize these claims:
Significance tests are useful to science only if you realise what aim of science they are meant to further and capable of furthering.
-It is unclear as to what is meant by “significance test”. There is a jumble of related but distinct items and practices that are given that label. Some use the label to refer to any use of P-values, mostly unaware of the innocuous geometric data vs. model contrast I described above. Others add on the comparison of p to a fixed cutoff (alpha level), which degrades p into a dichotomy or submerges it in a decision procedure (even if the decision is only to use charged descriptions like “significant” or “not significant” for results that may deserve no such label).

"That aim does not include showing that your hypothesis is true; ‘claiming a discovery’; or ‘the probability that there is a real effect’”.
-Sorry, but that sounds like pure phil-sci dogmatism about science and its aims. As a moral code for research I happen to agree that aiming to show H is true is a source of bad science, but stating that code in such a dogmatic fashion makes it sound like it is a logical consequence of values every scientist and stakeholder shares. There is no such shared value in practice, as witnessed by the enormous number of researchers who clearly want to show H is true and so claim discovery (that includes many of high social position as recognized scientists). Just read the physics literature about the discovery of the Higgs Boson based on rejecting the null because p was less than an alpha equal to the 5 SD tail of the standard normal distribution in two experiments at the LHC. Their aim was discovery and p<alpha was among their criteria. Their statistically tested H was the null hypothesis of “No Higgs” (along with the rest of the Standard Model as the accepted background A) which apparently no one believed; their real preferred hypothesis was the Standard Model with Higgs. See also
The Jeffreys–Lindley paradox and discovery criteria in high energy physics | SpringerLink

Many more researchers want to form a probability of an effect in a particular direction, or beyond some minimum size, and so on. Read the volumes of Bayesian statistical analyses aimed at building such probabilities. Some of that literature has even used P-values as bounds on posterior probabilities or has transformed them into posterior odds or Bayes-factor bounds, as reviewed for example in
Living with p values: resurrecting a Bayesian perspective on frequentist statistics - PubMed (nih.gov)
Living with statistics in observational research - PubMed (nih.gov)
Calibration of p Values for Testing Precise Null Hypotheses: The American Statistician: Vol 55, No 1 (tandfonline.com)
What is tragically overlooked (yet has been known since at least the late 1940s) is that Bayesian constructs for hypothesis probabilities can have superb performance as frequentist decision rules. The common-sense requirement to use them in this role is that the information entered in the prior distribution is free of harmful inaccuracies - a requirement no different in principle than that for data models.

A single p-value doesn’t mean anything.
-As discussed above, that’s false. But I hope we can all agree that
It doesn’t even mean your study is or isn’t worthy of publication.
-I’d go much further and argue that criteria for judging worthiness of publication should exclude study outcomes and be based only on study background, methods, conduct, and baseline data.

5 Likes

Finally, re 3:
“Falsificationism is not just compatible with significance tests (and vice versa), they are based on the same rationales. One of those is the asymmetry between verifiability and falsifiability of certain statements; another is the central aim of science: to learn from experience given that induction doesn’t work and that all our efforts are fallible.”
-My only quarrel here is what I see as careless use of the word “induction” and ex cathedra pronouncements like “induction doesn’t work”, which are among the ways naive Popperians undermine their own pitch for Pope Popper. The word “induction” has many uses and meanings, ranging from purely deductive mathematical induction to probabilistic induction to Neyman’s “inductive behavior” (and induction of current as well). This is a semantic item I complained about (using illustrative examples) in
Induction versus Popper: substance versus semantics - PubMed (nih.gov)
Then too probabilistic induction can be defined in a purely formal manner that makes it a logically sound deductive rule,
Probability logic and probabilistic induction - PubMed (nih.gov)
one that Bayesian galore will argue is quite useful…as will any frequentist who recognizes the frequency-performance (calibration) value of Bayesian methods.

From that I would suggest that arguments for a falsificationist perspective would do well to get past the vagaries of the word “induction” as if it were a formal term with only one meaning to all readers (just like statistics would do well to get past “inference”, “significance” etc.), lest it continue to be seen as promoting either absurd statements or else attacking straw men in a glaring failure to study criticisms carefully. Otherwise, in a supreme irony, it may be seen as failing to recognize its own fallibilities and those of its idols.

5 Likes

It will be hard to improve on posts by @Sander, whose many papers have improved my understanding of foundational statistical concepts. I submit the following Data Methods threads as worthy of reflection and follow up of the links and citations.

Setting aside the issue of p-hacking, a major source of confusion involves conflating Neyman’s (pre-experimental) \alpha (error rate of the procedure) with Fisher’s evidential use of discrepancy measures to summarize information from (randomized) experiments. Readers of research reports grant too much epistemic weight to the experimenter’s (local) \alpha level.

Combining the many things I’ve learned from a number of papers, I used frequentist tools and Bayesian reasoning when reviewing a meta-analysis on predicting ACL injuries.

3 Likes

Well, let’s see how we can make sense of this. Firstly, I’d like to reiterate that I’m interested in a discussion that is about possibly learning something (hence the explicit invitation to ask questions)—and that can’t really be achieved by a) simply finding one interpretation of what somebody else said that makes no sense and proceed to lambaste it and b) lengthily show off how clever we are without ever trying to find out what the other person actually means. Because then one ends up with heaps of text that have nothing to do with what that person is saying—an unfortunate state of affairs.

So, for those who are curious, what do I mean when I say: “A single p-value doesn’t mean anything”? This refers to Fisher’s repeated reminder that “a pheno­menon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result” (my emphasis). The fact that finding data at least as extreme as those I did find would be unlikely doesn’t by itself tell us anything about any hypothesis concerning a law-like behaviour of the world. Only an extended record of data can ever tell us anything of that sort—and then only in a modus tollens kind of way. And remember that neither Popper nor Fisher assumed that we would arrive at certain knowledge this way.

Which brings me to my first thesis. Why would it matter what you assume the aim(s) of science to be? Take, for example, the idea (often held only implicitly) that we are after certain knowledge and that anything deserving of the term ‘knowledge’ would have to imply certainty. This would have to mean that there either are sources of knowledge or methods of gaining it that are infallible. Unfortunately, nobody has ever been able to put even a remotely plausible argument for that. So what happens if, for the moment, we assume anything and everything we do to be fallible and certain knowledge to be impossible? That’s the question that Popper wrote a whole book about: Objective Knowledge. Then, what we can be after is “only” better knowledge (than we had previously)—in fact ever better knowledge, because that process would never have to terminate. And the logic we can use for that kind of goal is not one that promises to allow valid conclusions from the truth of singular statements to the truth of universal statements (or their probability, which leaves the argument just as invalid), which is the actually well-specified thing that Popper referred to as “induction”. On the contrary, we can get by with just using deduction—because that, in contradistinction to induction, actually allows us to make logically valid choices, eg which theory of two (or more) is the better/best one (given certain other assumptions, of course).

This last point is where philosophy of science and the idea of significance tests converge: that what we do is, in Fisher’s words (echoing Popper), meant “to aid the process of learning by observational experience”.

And finally: please let’s stop calling something “dogmatic” for no other reason than that it was presented as a statement judged to be true. Otherwise, statements like “As discussed above, that’s false” would also have to be called dogmatic. Which would be silly and no way to conduct a good-faith discussion. (Not to mention somewhat juvenile sneers about “Pope Popper”, which we should all be above, even though those are at least interestingly ignorant. )

1 Like

You called?

lengthily show off how clever we are without ever trying to find out what the other person actually means
-That comment is insulting and applies in reverse: It displays your profound ignorance of both logic and the viewpoint of some experienced applied statisticians who understand the need for multiple and precise perspectives on real, complex problems (which is to say all problems in health, social and medical sciences). Logical and mathematical skeletons play a central role in describing systems and forming judgments and decisions about them, even though those skeletons are far from sufficient for that purpose. That you fail to see the geometric pictures and symmetries that underpin the fundamental meaning of mathematical statistics like P-values, and instead denigrate an argument you fail to comprehend instead of asking for clarification, only underscores your lack of qualification to be lecturing or even debating on this forum.

‘As discussed above, that’s false’ would also have to be called dogmatic. Which would be silly and no way to conduct a good-faith discussion.
You dodged the point: I backed up my statement with a detailed explanation of why I wrote it. You offer instead a series of ipse dixit pronouncements as if I should take you seriously when you clearly can’t understand and grapple with a detailed logical explanation of what I meant - something you did not supply for your own statements.

(Not to mention somewhat juvenile sneers about “Pope Popper”, which we should all be above, even though those are at least interestingly ignorant.)
That you would call out that indirect comment when you commit the above juvenile evasions and direct insults is an act of remarkable hypocrisy, even for yet another self-proclaimed authority on philosophy of science with no more than a shallow grasp of statistical theory and applications.

I will say your reaction is consistent with much of what I see from those who promote philosophy over practicality, and go so far as to confuse the two. Most of all, it confirms what I suspected from your Twitter posts: You are not only unfamiliar with most of the literature relevant to foundational debates in statistics; you are also unable to acknowledge your massive knowledge gaps, or see how in those gaps there is a coherent counterpoint to the naive Popperianism you promote. Your evasion is understandable, as to realize what you are missing would reveal how far short you fall in your comprehension of real-world statistics and the debates that have raged in it for centuries.

As for Popper: He was far from naive, and his own writings are worth reading because he would follow his outrageous openers with detailed explanations of why he made them, displaying how they were provocations for pursuit rather than slogans for lesser minds to adopt as catechisms. The same for Feyerabend, DeFinetti, and others with a theatrical bent. Popper also attempted to grapple with logical technicalities head on, rather than evade them with insults and excuses. My comment about Pope Popper was instead directed at those like you, who simply repeat Popper’s words without his substance or depth.

You need to read much more widely and deeply with a mind unshackled to Popper or anyone, and find your own voice if it is to amount to anything other than evangelical diatribe masquerading as philosophy, or as mere pretension posing as depth.

Why not start by defining what “falsificationism” even is? I might be wrong here, but I don’t think that’s a term Popper has ever used (definitely not in The Logic of Scientific Discovery). People just use “falsificationism” to refer to different subsets of Popper’s ideas. Is it a normative claim about how science should be done? A descriptive claim about how science progresses? An epistemological thesis about what can be known? A mix of everything? Gelman thinks he’s a “falsificationist” and I’ve seen people describing de Finetti as a “falsificationist”, so I’d say the term is so vague that it’s not very useful.

Relatedly, it’s almost taken for granted in these debates that frequentists are “falsificationists”, but I’m not so sure. People forget that Fisher distinguished two main problems in statistics: estimation (when there are well-defined alternative hypotheses and calculable likelihoods) and testing (when there are no clear alternative hypotheses and no clear likelihood). Testing problems do sound “falsificationist” in nature, but Fisher was more than happy to make positive claims (“claiming a discovery”) in estimation problems. (Also, as the Higgs Boson example by Sander shows, even testing can be used to make discoveries). The goal of Fisherian estimation is not to falsify a whole model or anything else, but to find out what’s evidentially supported by the data. There doesn’t seem to be anything falsificationist about it, whatever that is. So it seems that Fisher thought that both falsification and confirmation were important, albeit in different contexts. If he is a “falsificationist” just because he thought falsification is important in some situations, then the term is not very descriptive. It would be just as misleading to call him a “verificationist”.

3 Likes

A plea to everyone: keep it civil. Criticize ideas not people. Watch the adjectives when applied to persons. Lastly I hope someone can summarize key points from all sides in simpler language that what I read above.

9 Likes

A good part of the 1982 Introduction to Realism and the Aim of Science is devoted to untangling “[a]n entire literature [that] rests on the failure to observe this distinction” between “the logical possibility of falsification in principle … [and] a conclusive practical experimental proof of falsity.” [xxii, emphasis in original] Part IV, starting on p. xxxi, begins like this [bold emphasis added]:

This may be the place to mention, and to refute, the legend that Thomas S. Kuhn, in his capacity as a historian of science, is the one who has shown that my views on science (sometimes, but not by me, called ‘falsificationism’) can be refuted by the facts; that is to say, by the history of science.

I do not think that Kuhn has even attempted to show this. In any case, he has done no such thing. Moreover, on the question of the significance of falsification in the history of science, Kuhn’s and my views coincide almost completely.

So I think we have there at least a clear statement on the status of ‘falsificationism’.

2 Likes

I think the questions being discussed are now more fruitfully addressed using the techniques of computational theory as Paul Thagard discussed in A Computational Philosophy of Science. It is also discussed in this link on computational epistomology.

If we translate “falsification” to “computable” the “scientific method” becomes a decision procedure (aka. algorithm) that can be examined rigorously with mathematical methods. I like to think of “computational theory” as simply applied logic.

This is closely related to Sander’s attempt to bring the power of modern mathematical logic to the questions of statistical methods in his posts above.

In his introductory text Mathematical Logic, Stephen Cole Kleene, addresses the paradox of using logic to study logic. The technique used is to compartmentalize the formal system being studied (the object language), from the logic used to study it (observer langauge or meta-language).

In statistics, there are at least 3 formal languages that can be used to discuss “scientific methods” (or “learning algorithms” if you prefer the computational philosophy):

1. Bayesian Theory
2. Frequentist Theory
3. Information Theory.

Some of the most important results in statistics are mapping the formal languages of Bayesian Theory to those of Frequentist theory (and vice versa).

Waiting exploration is to use information theory as the meta-language to explore questions regarding the mappings between Bayesian and Frequentist theory.

4 Likes

I just have to mention that I recently encountered a lovely and very readable articulation of one view of the (proper) aims of science [1]. This in turn led me to read It made me wonder if Popper might be of little use to someone who has adopted (what Popper calls) an instrumentalist view of science, or of the role of their chosen discipline (statistics, say) within science. If we eschew an interest in theories (particularly bold ones) and in explanation, if there is no deeper reality than what is immediately presented in the data tables, then why why should Popper’s ideas matter?

1. Popper KR. Ch 5: The Aim of Science. In: Objective Knowledge: An Evolutionary Approach. Rev. ed. Clarendon Press ; Oxford University Press; 1979:191-205. http://www.bretthall.org/uploads/3/1/2/9/31298571/karl_r.popper-_the_aim_of_science.pdf
4 Likes

[Response to Norris:] Excellent point to raise; if however I’m forced to label myself it would be as a perspectivalist or epistemic and methodologic pluralist. In doing statistics that plurality would indeed include foremost instrumentalism and its pragmatic relatives.

Here’s one reason why: Most of the so-called hypothesis and “significance” testing activity of a competent applied statistician in soft sciences concerns “theories” barely worth that label. Many hypotheses of enormous practical importance when laid out in a modern causal framework (like “do mRNA vaccines cause anaphylactic reactions?”) look embarrassingly trivial when framed in the “bold conjecture” discussion that it seems Popper is most often cited for. That’s because they do not challenge any legitimate existing (let alone “established”) theory; in fact they are often most plausible based on previous observations or theory.

The bold-conjecture framework suits truly startling breaks from received frameworks (e.g., relativity, continental drift, and jumping genes), where the primary challenge is in upending older and previously highly successful explanations that have ossified into dogmas or universal facts. In contrast, the challenge in testing philosophically trivial theories like medical side effects are a combination (often toxic) of highly technical challenges in study design and data analysis with human factors like statistical incompetence (including but not limited to my hobbyhorses of dichotomania, nullism and reification), conflicts of interest (fueled by liability concerns and amplified by our tort system), political biases, etc. Here it is the sociology of science as a scientific discipline that needs our utmost attention.

Philosophy of science can help if applied at appropriate levels of analysis, but can hinder when overemphasized at the narrow technical level needed to extract information from data (which I see as the focal purpose of statistical methods). I see that hindrance as exemplified in the logically absurd “philosophical” conflicts between avowed frequentists and Bayesians that plagued mid-20th century statistics and continued to plague “philosophy of statistics” to that century’s end. It took until the 1970s for many application-savvy statisticians (like Box, Cox, and Good) to realize that conflict was a distraction and endorse the value of having flexible perspectives and diverse tools (frequentist, Bayesian, likelihoodist, and more) which could even be used in tandem in the same problem.

If all that sounds a bit Kuhnian, that’s because it is (or at least neoKuhnian): As far as I have read, modern perspectivalism grew out Kuhn’s SSR, which I think crystallized the view (anticipated by Hume* and later dramatized by Feyerabend) that epistemology and especially philosophy of science is itself subject to the human factors including dogmatism, evangelism, intolerance, pretension to universal truth etc. that characterize the worst of science and religion; this happens especially when it drifts into fields beyond the expertise of its “experts”, such as statistical theory, applied statistics, metaphysics, and the largely hidden influences of morals and values on science and its methodologies (note the plural).

*For today’s example see Syll’s blog:
On sophistry and illusion | LARS P. SYLL (wordpress.com)

4 Likes

The bold-conjecture framework suits truly startling breaks from received frameworks (e.g., relativity, continental drift, and jumping genes), where the primary challenge is in upending older and previously highly successful explanations that have ossified into dogmas or universal facts. In contrast, the challenge in testing philosophically trivial theories like medical side effects are a combination (often toxic) of highly technical challenges in study design and data analysis with human factors like statistical incompetence (including but not limited to my hobbyhorses of dichotomania, nullism and reification), conflicts of interest (fueled by liability concerns and amplified by our tort system), political biases, etc. Here it is the sociology of science as a scientific discipline that needs our utmost attention.

Yes, Popper was so fixated about revolutionary science that he forgot that normal science even exists.

This is a good summary of what’s wrong with this view (by David Papineau):

In retrospect, Popper’s falsificationism can be seen as an over-reaction to the
demise of classical physics at the turn of this century. The replacement of Newton’s
physics by Einstein’s was a great surprise, and showed that the evidence underpinning
the classical edifice was far less firm that everybody had supposed. Popper’s mistake,
however, was to condemn all inductive reasoning for this failure. Maybe inductive
evidence will never suffice to lay bare the large-scale structure of space-time, or the
other fundamental secrets of the cosmos. But this does not mean that it can never
identify such more mundane facts as that cigarettes cause cancer.
(…)
Despite these manifest failings, Popper’s falsificationism is popular among
practising scientists. The reason is probably that Popper’s story best fits science at the
cutting edge of research. Most new ideas at the limits of knowledge do start life as
pure speculations, and it is true that they are distinguished from the musings of
madmen only by the precision which allows them to yield definite predictions. By
focusing exclusively on this aspect of science, Popper creates the impression that all
scientists, however workaday, are creative visionaries with minds of steel.

But speculative research is not the only kind of science, or even the most
important kind. There would be no point to science unless its conjectures sometimes
acquired enough inductive evidence to graduate to the status of established truths.
This is the real reason for testing hypotheses against predictions. The aim is not to
falsify them, but to identify those that can be turned into the kind of positive
knowledge that enables us to build bridges and treat diseases

2 Likes

1.An informational view of statistics goes back at least to Fisher’s landmark work on estimation. Many links between information theory and standard frequentist and Bayesian statistical theories were worked out in the mid-20th century soon after Shannon’s landmark 1948 paper, as covered in this 1959 book by one of the pioneers Amazon.com: Information Theory and Statistics (Dover Books on Mathematics): 9780486696843: Solomon Kullback: Books
-note that the Fisher information matrix is in the second-order expansion of the Kullback-Leibler information criterion (a divergence measure in information geometry which corresponds to using likelihood-ratio statistics in regular models).

Of course in the >60 years since there have been far more developments linking statistics and information concepts, including books such as Good Amazon.com: Good Thinking: The Foundations of Probability and Its Applications: 9780816611416: Good, Irving John: Books and Jaynes (which takes a very Bayesian view compared to Kullback) Amazon.com: Probability Theory: The Logic of Science: 9780521592710: Jaynes, E. T.: Books
I confess I am in no position to recommend a recent one and would gratefully receive such a recommendation.

I find it a shame that information perspectives on statistics have been so neglected in the “soft sciences”, for I think theoretical and applied statistics have been at their best when devoted to extracting and summarizing information from data (whether for mere presentation or for input to decisions), rather than testing or placing bets (posterior probabilities) on statistical hypotheses (which are simplistic formalisms that seem inevitably confused with scientific hypotheses).

2.Thanks for bringing up computational epistemology! I have no expertise in the topic but have been following it for some years as a welcome attempt to move on from the 20th-century controversies and paradigms in which I was raised and educated. Kelly’s work cited in your link is mentioned in my 2017 article at https://academic.oup.com/aje/article/186/6/639/3886035 :
"nullism seems to reﬂect a basic human aversion to admitting ignorance and uncertainty: Rather than recognize and explain why available evidence is inconclusive, experts freely declare that ‘the scientiﬁc method’ treats the null as true until it is proven false, which is nothing more than a fallacy favoring those who beneﬁt from belief in the null (29 = Greenland 2004 The need for critical appraisal of expert witnesses in epidemiology and statistics | Request PDF (researchgate.net) ). Worse, this bias is often justiﬁed with wishful biological arguments (e.g., that we miraculously evolved toxicological defenses that can handle all modern chemical exposures) and basic epistemic mistakes - notably, thinking that parsimony is a property of nature when it is instead only an effective learning heuristic (30 = Kelly 2011 Simplicity, Truth, and Probability - ScienceDirect ) or that refutationism involves believing hypotheses until they are falsiﬁed, when instead it involves never asserting a hypothesis is true (31 = Popper LSD 1959).

4 Likes

Will it, really? One way comes readily to mind: actually engage with anything I’m talking about.

Absolutely. But let’s not both-sides this when there is exactly one person here that your plea applies to. That person thinks it’s okay to insinuate another’s “lack of qualification”, that they “clearly can’t understand”, that they have “no more than a shallow grasp of” an issue (which that person hadn’t even talked about), that they are “unable to acknowledge…massive knowledge gaps”, and that they are “unfamiliar with most of the literature”. And that’s while he has also not engaged in a single thing I’m actually talking about. So he’s 2 for 2 with respect to the forum rules.

So let me reiterate the invitation to discuss the actual ideas that I referenced in the OP. I’ll happily expand on what I said there and answer any questions regarding what they’re based on. And if any of the undoubtedly clever stuff (yes, I actually meant that) that’s already been said here should prove relevant to my points, then I’ll equally happily engage with that.

You cannot productively discuss statistical procedures without some comprehension of the mathematics behind them which often requires algebra and maybe a bit of calculus.

Richard Feynman in discussing the relationship of math to physics quoted Euclid, who said “There is no royal road” (ie. easy way) to learn geometry or physics. There is no royal road to learning statistics without mathematics, either.

1 Like

That’s certainly a relevant question. There are two ways of answering it, one negative and one positive. The negative way (‘what falsificationism isn’t’) can be seen in my reference [1]. Misconceptions about the idea abound, and nobody should be talking about it without at least making very clear how their use of the term and the idea corresponds to what Popper’s notion actually amounts to. Because very often, the people using the term don’t, in fact, know that.

If I had to give a positive sort of definition of the term, I’d say what I said in the OP: that it’s a methodology for learning from experience given that induction doesn’t work and that all our efforts are fallible. This includes a clear-eyed view of the aforementioned asymmetry between verification and falsification: the insight that singular statements may be verified or falsified, but that universal statements can only be falsified (always assuming valid logic). It extends into the realm of what Popper called “methodological rules”, which necessarily have to complement the purely logical analysis. (Cf. §11 of LoSD)

Of course, that doesn’t restrict anyone’s ability to call themselves (or somebody else) a “falsificationist”. That usually (in practice) just means that somebody endorses one or another of the elements of falsificationism as I defined it above. In that sense, Fisher would be a falsificationist for underlining the mentioned logical asymmetry: “Every ex­periment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.”