Significance tests, p-values, and falsificationism

The basic argument is pretty much unchanged for the 40 years since Mulkay and Gilbert wrote “Putting Philosophy to Work” in the early 80’s (Philosophy of the Social Sciences 11 (3):389-407 (1981)). Mulkay and Gilbert actually interviewed scientists and watched what they did in the context of Popper’s theories. A typical theme was that Popper’s work didn’t actually guide the science while it was being done, but might be discussed as a post hoc legitimization. Particular attention was paid to one scientist who claimed to be a Popperian and to use Popper in his daily scientific practice. But other scientists said that this scientist didn’t follow Popper at all, and when an experiment was published that falsified this scientist’s theories, he fought “tooth and nail”. As Mulkay and Gilbert put it, the scientist could “just as easily be seen as contravening Popper’s maxims as exemplifying them”.

This is the critique of Popper from the perspective of the sociology and the history of science: if you say “here is what science is and here is how you should do science”, it is somewhat of a problem if most scientists in contemporary practice and indeed, back through history, didn’t do much of that at all.

I with Frank here (except that History and Philosophy of Science was my major at Cambridge, so I am an interested observer who was trained in philosophy): falsification is an interesting idea in the abstract but isn’t much use in the actual doing of science (the same is true of reliabilism by the way, seeing as how David Papineau was mentioned on this thread). I run around calculating p-values for hypothesis tests all the time and it is hard to see how much of what I do fits in a falsificationist framework. Rejecting or not rejecting the null does have superficial similarities to falsifactionism, but they don’t really go much deeper. For instance, I just looked up my last paper published and, in brief, we were testing whether A predicted B. The p value was 0.3. Our conclusion was that A did probably predict B, but that any effect size was too small to worry about and that any patient who developed A should be reassured about B. I stand by that interpretation, but try linking that to Popper and you’ll just get dizzy.

In brief, I majored in philosophy at a world-renowned institution and I am now a practicing scientist. I’m pretty sure that fact that I’m good at the latter is due to the former. But it is indirect. Philosophy made me a better thinker, and a different type of thinker, and that has made me a better scientist. I’m not a better scientist because I looked up the recipe for better science on page 283 of some philosophy textbook.

6 Likes

Just so that I understand properly what you’re saying: In that example of “Spencer”, it is precisely that he is sticking to a theory as long as possible (in what you might call a “dogmatic attitude”) that runs counter to his professed following of Popperian methodology. Is that correct so far?

I’m not saying anything about Spencer. I never met them. What I’m saying is that when we watch what scientists actually do (in my case, what I do and what my colleagues do), it doesn’t match up at all with what Popper claimed about the scientific method. This is even true of scientists who stated that they followed Popper and put his methods into daily practice.

No, of course. I just meant, since you referenced Mulkay & Gilbert, that you would at least agree that “Spencer”, although professing to follow Popper’s ideas, instead showed a pattern of (dogmatically, one might say) sticking to his theory as long as possible, and thereby undermining his professed adherence to Popperian methodology—“it doesn’t match up”, as you say.

Go read the original paper for the description of Spencer’s behavior: you don’t need my characterization of it over and above that it did not appear to follow his professed beliefs.

I just wanted to double-check that my understanding was correct that you agreed with the paper’s point that “tooth and nail” Spencer’s actions ran counter to his professed Popperian beliefs. Just wanted to be sure I didn’t get that wrong.

Peter: I’m sorry to say that I find your goading tone rather inappropriate. You are clearly trying to get me to agree with some extremely narrow point as a debating technique. As such, you are missing the larger point. Back in the day, philosophers of science would listen to scientists in order to understand more about science. You’ve had three of the world’s top statisticians tell you that falsificationism and significance testing have only superficial similarities. You don’t seem to want to listen. As a result, I actually don’t want to try explaining any more.

2 Likes

A tone troll? Didn’t they teach you in Cambridge that that is the best way to embarrass yourself as a scientist? If only you had made the effort to find out why I kept asking, Andrew. That would have shown some critical thinking.

Because here’s something I find inappropriate: People who don’t know what they’re talking about and instead keep boasting about their degrees from “world-renowned universities”. People who cite an article that is not just all-round laughable but that has, not as “some extremely narrow point ” but as its centerpiece, someone who is reported to stick to his theories in an arguably dogmatic fashion, an attitude the authors portray as incompatible (or in your words: that “doesn’t match up”) with Popperian thought—when that is blatantly false.

You see, unfortunately, neither the authors nor you knew enough about Popper (and didn’t think to ask an actual expert) to realise that the “dogmatic attitude of sticking to a theory as long as possible” is a direct Popper quote—an attitude, he said, was “of considerable significance”. (“What is dialectic?”, in C&R) Why? Because “without it we could never find out what is in a theory—we should give the theory up before we had a real opportunity of finding out its strength”. So Mulkay/Gilbert’s interpretation that Spencer is “contravening Popper’s maxims” is simply ignorant.

I could now condescendingly tell you that back in the day, scientists would listen to philosophers of science to understand more about it, and that you have had an actual Popper expert tell (and show) you that what you think you know about him just isn’t so. But then you don’t seem to want to listen…

This sums up my attitude regarding your understanding of mathematical logic and how it applies to both empirical science and mathematics.

If mathematics is the queen of science, and we apply Poppers ideas to mathematics itself, the TLDR is that Popper has been falsified.

I posed a simple question above: why should one accept, as a matter of logic Poppers’ proposed asymmetry between \forall x and \exists x? Carnap asked this and other scholars as evidenced by the link I posted above, also found it odd this was never answered.

Popper was a deductivist who claimed that only logical derivations can be trusted. That is another philosophical question that is debatable. Quine convincingly refuted it in his Two Dogmas of Empiricism.

As a simple point of classical logic, negation of one quantifier is a synonym of another. So as a point of logic, Popper is wrong.

In a particular context (ie. the free variables are bound to a model) perhaps Popper’s claim has some merit. The computer scientist Edgars Dijkstra correctly stated that testing a program and finding no errors does not demonstrate an absence of them.

A charitable interpretation of Popper leads me to see it as sufficiently close to Feynman’s notion of a scientist with integrity as to be interchangeable.

Blockquote
But there is one feature I notice that is generally missing in cargo cult science. That is the idea that we all hope you have learned in studying science in school–we never say explicitly what this is, but just hope that you catch on by all the examples of scientific investigation. It is interesting, therefore, to bring it out now and speak of it explicitly. It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty–a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid–not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked–to make sure the other fellow can tell they have been eliminated.

A strict interpretation, using the development of mathematical logic and analysis as an example, leads me to conclude Popper’s philosophy is false.

At the time “Testability and Meaning” was written, a number of fundamental results were discovered in mathematical logic, which I’m certain Carnap was aware of. Popper, by remaining silent, skirted the critical issue.

The incompleteness theorems of Godel get all of the philosophical attention, but equally interesting are the consistency results of Ackermann and Gentzen in the late 20’s thru early 40’s. They independently used a similar method. Ackermann proved the consistency of Primitive Recursive Arithmetic (PRA), while Gentzen used PRA to prove Peano Arithmetic was consistent.

PRA (with transfinite induction) is a formal language that expresses the natural numbers, but does not have quantifiers as Peano Arithmetic does. Yet, the cardinality of the languages are the same. If PRA is the meta-language, and Peano Arithmetic is the object language, we can express meta-statements about Peano in PRA. This notion of “consistency” is a meta-notion of fundamental importance.

Godel produced a slightly modified version of a Gentzen consistency proof that is known as the Dialectica interpretation.

W.W. Tait extended this notion and placed it in a game theory context. In logic this is known as the “no counter-example interpretation.”

If we take a (Popperian) skeptic of mathematics, this proof compels him to admit 1. he has the burden of proof to produce a counter-example, and 2. a counter-example does not exist (up to \epsilon_0, Cantor’s smallest infinity – the set of natural numbers).

Taking this game interpretation further, if we extend the propositional calculus from a binary truth value to a continuous one, we get a system that is consistent with probability theory (see Algebra of Probable Inference by R.T. Cox) with the semantic notion of betting on future observations – first described as an implication of Shannon’s communication theory by J.L Kelly in A new interpretation of the information rate (pdf) and extended by the more recent work of Glen Shafer and Vladimir Vovk on building up probability with game theory as the foundation.

The advantage of this approach is the ability to express Keynesian notions of imprecise probabilities, along with computational considerations, that get swept under the rug in applications that Sander justifiably finds important to mention.

2 Likes

Blockquote
So as a point of fact, you’re still too lazy to actually read Popper and possibly learn something. Got it.

There are only 2 interpretations of Popper’s claim that make sense – the variables are free, or they are bound (ie have a set of values, aka a model).

I’ve dispensed with the case where variables are free. In the bound case, there exist examples where it makes sense, but if you took a few moments to think, you would see that Popper’s idea implies probabilities on \forall x and \exists x, which undermines his denial of probability.

If you still wish to attempt a rehabilitation of Popper here, you will need to address my example of consistency proofs regarding arithmetic, and the development of mathematics more generally. Engaging in ad homenim and bluff won’t do.

So you are going to resort to gaslighting now? This is what you posted:

Blockquote
Falsificationism is not just compatible with significance tests (and vice versa), they are based on the same rationales. One of those is the asymmetry between verifiability and falsifiability of certain statements.

As to the question of the discovery of the Higgs boson:

They do indeed contain such inaccuracies, and quite a few. How do I know, since I am also not a professional particle physicist? Because I could actually be bothered to ask one. (Two, in fact.)

You wanted to dispute my claim that the results had been “predicted to exquisite precision”, which you subsumed under “misleadingly oversimplified descriptions of real scientific activities”, and then go on to say that the results were actually “only at about 0.1% accuracy”. That accuracy pertains to the mass of the Higgs, though, which unfortunately is the one parameter that was not actually predicted.

What was predicted were the properties of the Higgs: a massive boson, zero spin, no electrical charge, no colour charge, decay within a specified life time, decay into a well-defined destribution of lighter particles. That is what is encompassed by “exquisite precision”.

You then offered a supposed “correction” of the independence of the two experiments. “Any undetected defect [in the LHC] capable of reliably producing a ~125 Gev signal…would have affected both experiments”, you say. I asked NYU physics professor Kyle Cranmer whether that made sense; he replied:

I can’t see any conceivable way that a defect in the LHC beam or other common component to the two experiments could possibly mimic a bump that looks like the Higgs. And the way that would propagate through the two experiments would, in general, be different.

In a word: no. Why not, you ask? Because, as per my second source, “what was detected was not any characteristic of beam energy, but of specific particles produced. The collisions had to result in certain specific particles, not just particles with a certain energy.” Again: it’s all about the theory.

And that has been one of my main points, of course: all the statistical results, including the p-values, didn’t induce anyone to claim a discovery by themselves but only in the context of precise predictions flowing from an explanatory theory. That is one crucial point about Popper’s methodology that people, including in this thread, keep failing to grasp.

Boorsboom, whom I quoted above, at least sees part of that problem when he says: “And that’s why psychology is so hyper-ultra-mega empirical. We never know how our interventions will pan out, because we have no theory that says how they will pan out”.

And instead of taking the trouble of actually asking someone, you go on to speculate. I don’t think that’s any way to conduct a critical discussion, really.

It’s funny how certain people will predictably accuse you of just that of which they themselves are guilty.

One of those is the asymmetry between verifiability and falsifiability of certain statements.

Which part of “certain statements” don’t you understand? And why have you still not read the passage (which I quoted from) where Popper explains precisely what that refers to?

Since you are just going to double down on evasion of things you previously posted, I’m going to take a different approach.

If humans are fallible and sense experience and history is not a reliable guide to the future, why should anyone trust the human edifice of mathematics?

The approach of actually reading up on Popper’s explanation of what you keep denying he explained? :upside_down_face:

That way to put it equivocates (however inadvertently) on “reliable”. What Popper means by that word (as he explains at length) is the equivalent of certain knowledge, ie ‘reliable’ in the sense of ‘can’t go wrong’. That kind of absolute reliability cannot be had. What can be had, though, is the knowledge that A is a better explanation than B and should therefore rationally be preferred.

As I keep saying: If you ignore (deliberately or otherwise) that for Popper the aim of science is to find unifying explanatory theories that are highly falsifiable and that there is no such thing as certain (or even probable) knowledge but only knowledge that can forever be improved, you will not understand his philosophy.

So, did Popper really doubt the absolute validity of arithmetic and number theory? If not, why not? If so, why?

That’s a subtly different question, though. But an interesting one. If by it you mean that Popper eg thought that certain statements (his example was “2+2=4”) can be an instance of “absolute truth”, then that would be correct to say. But Popper was wrong about that. The truth of such a statement is obviously relative to certain assumptions: which numeral system we’re talking about; the definitions of our operators; the definition of eg natural numbers; etc.

To be fair to Popper, he almost always talked about “objective and absolute truth”, and he was right that there is such a thing as objective truth. A statement such as “2+2=4” is objectively true if and when certain sufficient assumptions are made.

Blockquote
The truth of such a statement is obviously relative to certain assumptions: which numeral system we’re talking about; the definitions of our operators; the definition of eg natural numbers; etc

All of these things have well established usages in the “game” of mathematics.

You seem to doubt that Popper was correct in classifying “2+2=4” as an “absolute truth.” Why? The “assumptions” or “axioms” of PA can be interpreted mechanically as imperatives.

I had no intention of posting back here, but am given no choice in light of Monnerjahn’s belated response to my last post a few weeks ago. My comment is not really in reply to him (whom, given he clearly is suffering from severe testosterone poisoning, is more to be pitied than debated), but rather my autopsy of our “exchange”. His new reply does deserve some comment regarding his representation the Higgs experiments, representations which are (surprise!) slanted to make his opponent look completely wrong, ignorant, and foolish, and him look completely right, omniscient, and wise (it appears that one his operational axioms is that he makes no mistakes, whether in fact or tone).

In particular, Monnerjahn’s reply further supports my earlier thesis that “some philosophers and popular accounts have (as above) technically misrepresented the Higgs/LHC experiments in ways which obscure some interesting issues”; and more generally that “the realities of the experiments and the science are not as anywhere near as clear cut about this as some philosophers have made it sound. That portrayal problem is characteristic of the misleadingly oversimplified descriptions of real scientific activities and results I see in much (not all) of the philosophy of science literature, especially in heroic accounts of bold conjectures and experiments in which the latter turn out to be a lot more muddy than presented (perhaps ‘cargo-cult philosophy of science’ would be a suitable label for that practice).” I’ll now give examples below.

When I made that last post here (now a few weeks ago) I sent along the exchange to a few philosophers of science (full professors at major universities, who corroborated my dim view of Popperians) and also a physicist on the CMS team - one who specializes in statistical analysis (which comprised thousands of physicists), Robert Cousins, whose Synthese paper I cited earlier:

erratum: Erratum to: The Jeffreys–Lindley paradox and discovery criteria in high energy physics | SpringerLink
Cousins has noted there is some variations in opinions among the Higgs team and in the particle-physics research community. For example, most are content with or favor frequentist methods, but some have Bayesian leanings.
Regarding the ATLAS and CMS experiments, Cousins very generously supplied me with extensive comments and sources, amounting to a rather different impression than the select quotations from others that Monnerjahn offered above. Among them:

1.Possible sources of correlation of the ATLAS and CMS detectors: Cousins confirmed that the LHC source was extremely unlikely to be a source of dependence, so on that point I must concede that my speculation about how it might be a source was indeed physically untenable: The auxiliary assumption that the LHC could not produce an artefactual signal for both is held with certainty, and so I was mistaken in questioning Monnerjahn’s claim of “complete independence” on that practical basis (it is a mere physical fact that both detectors depend on the same source).

On the other hand, Cousins reported that the results from the two detectors do indeed have sources of dependence, noting that “The papers on the measurements of the couplings went into great detail about what systematic uncertainties are not independent (typically in values from approximate theoretical calculations that are used deep in the data analysis of both experiments).” Methodologists will be quite familiar with analogous problems in research synthesis in which results of otherwise independent experiments can suffer from the same bias sources, e.g., small-sample or sparse-data bias in statistical approximations due to low event rates, as is common in meta-analyses of RCT data on rare adverse events. A crucial point is that independence involves not only physical independence but also independence in methods and assumptions. So, for the ATLAS/CMS experiments my conclusion is again that no, they were not (as Monnerjahn claimed categorically) “completely independent”, but contrary to my speculation (which offered as such) the LHC source was of no practical concern; rather, there is dependence from shared inputs for final calculations.

2.Regarding precision of predictions: I wrote that, contrary to Monnerjahn’s categorical claim, “the results were not predicted to exquisite precision”. Indeed, that’s obvious from the source reports. Here is what Cousins wrote (emphasis added):
“Too much of what is said or written publicly is over-simplified to the point of being misleading, I think. For example, the exact value of the mass had nothing to do with our belief that we found the Higgs boson. We would have been just as happy if it had been 115 or 135 instead of 125 GeV, since the mass was the last free parameter in the theory, only crudely inferred in advance from indirect measurements. (Measuring 113 would have been odd, since that was ruled out by earlier experiments.) It was the so-called spin/parity and couplings measured in subsequent measurements that clinched our belief that we found “a” Higgs boson (still hoping for another); we initially said ‘Higgs-like’ boson in July 2012.” That appears in good accord with what Monnerjahn quoted from Kranmer. Neither quote contradicts the fact that the Higgs mass was not predicted to exquisite precision.

Now Monnerjahn saves his claim of “exquisite precision” by redefining what he meant in terms of purely qualitative properties about the prediction. I find this an evasion, corroborating my impression that some of what passes for philosophy of science employs rhetorical caricatures of ambiguous realities to score decisive points. The goal seems promotion of the idea that oversimplified portrayals of complex physics experiments convey vital lessons for fields orders of magnitude more complex and orders of magnitude more poorly understood than physics, such as social and medical sciences. Some philosophers of science are quite aware of this problem and recognize that folk-tale physics can conceal much that is embarrassing for the simplifier; again see Trust in expert testimony: Eddington's 1919 eclipse expedition and the British response to general relativity - ScienceDirect for an example, and more general discussions like Cartwright (1983)
How the Laws of Physics Lie - Oxford Scholarship

I do think there are some important general scientific lessons from physics, but those were pretty well understood before Popper came along; for example see The Logic of Modern Physics (1927) by the physics Nobelist Percy Bridgman, which received considerable notice in science and philosophy before Popper’s heyday,
https://www.nature.com/articles/121086a0

In any event, those lessons as well as their reworkings and replacements in Popper, Kuhn, Lakatos, Giere, and many others, are not what some of us see as the major problems in health and medical science today.

Among the major problems are the continued warpage of study reports by statistical conventions (as I among many have written about at length in published articles), and its ongoing encouragement by perverse incentives in an academic environment shielded from consequences of bad practices. There is a certain irony here in that Monnerjahn wrote “the statistical results, including the P-values, didn’t induce anyone to claim a discovery by themselves but only in the context of precise predictions flowing from an explanatory theory. That is one crucial point about Popper’s methodology that people, including in this thread, keep failing to grasp.”
-My writings go on at great length about the importance of context in understanding what statistics mean, and what P-values actually mean both inside and outside of that context. I am hardly the first to write about that - one can find Pearson confronting that issue in the early 1900s, and by the 1920s psychologists and biologists as well as statisticians were discussing it and what P-values don’t imply and shouldn’t induce. And as far as I can see the other discussants in this blog appreciate all that quite well; it’s old news here. That the knowledge of the present discussants on this topic is much more extensive than Monnerjahn’s is one crucial point that Monnerjahn keeps failing to grasp.

3.Instead of attempting to answer my question about why the hypothesis H: “no Higgs” was tested, Monnerjahn writes “instead of taking the trouble of actually asking someone, you go on to speculate. I don’t think that’s any way to conduct a critical discussion, really.” - Really, critical discussion only involves asking questions and attacking people for speculating on answers? Especially in a forum where anyone could freely offer their own speculative answer?
As a matter of fact, I did ask Cousins that question; he responded at great length and pointed to sec. 5 of his Synthese paper, especially sec. 5.4, which all can read. Here is an interesting example from his response:
“There are indeed cases where the strong untested belief turned out to be wrong when it eventually got tested. The most famous is probably parity conservation violation in 1956. It turned out to be a few weeks’ work experimentally to show convincingly (in more than one type of experiment) that parity (mirror symmetry of fundamental forces of nature) was not conserved. But it had previously been so strongly believed that theorists Yang and Lee got a Nobel Prize for suggesting that it be tested (!) in the weak interaction.” My reading is that through hard experience the physics community became quite wary of dogmatic, inflexible belief in physical laws. They understood this in accord with events and analyses laid out well before Popper (such as the fall of classical mechanics in the early 1900s).
(end of numbering)

On the more general issue of physics as a model for science or philosophy of science, I’m sure we could each go on at book length and get nowhere with each other, so I’ll just focus on some items that came up here.

In some respects, the traditions of theoretical and experimental physics provide little guidance and can even mislead, precisely because their extent and precision is so far beyond anything attainable in soft sciences. But applied physics provides cautionary examples of how even the most refined, well-tested theory rarely provides more than rough guidance in tackling typical applications. Despite these cautions, we still see caricatures like the one from the Boorsboom link that Monnerjahn cited approvingly (Open Science Collaboration Blog · Theoretical Amnesia): “In the more impressive cases, the predictions are so good that you can actually design the entire bridge on paper, then build it according to specifications (by systematically mapping empirical objects to theoretical terms), and then the bridge will do precisely what the theory says it should do. No surprises.” That should be seen as terrifyingly wrong to anyone who knows the history of structural failures. Reasonable safety assurances call for massive overbuilding and back-up and safety components that go well beyond the ideals of theoretical calculations. It’s not because the laws of physics (the grand theories) are wrong, but because the specific physical models used to operationalize those theories are always far more simple than the reality they are applied to, omitting sources of failure that were unanticipated. A classic example is Tacoma Narrows Bridge (1940) - Wikipedia. If we turn to mechanical and aeronautical engineering, the body count from modern products that passed all theoretical calculations and even made it into service should belie the fairy tale in the Boorsboom quote; a classic example: de Havilland Comet - Wikipedia.
The point is, glorification of theory in practice can be fatal (recall the famed quote “In theory there is no difference between theory and practice; in practice there is,” which apparently dates from the 19th century In Theory There Is No Difference Between Theory and Practice, While In Practice There Is – Quote Investigator). Conversely, Roman engineers built spans that still stand thousands of years later, with no benefit of physics as we know it today.

In summary, there is a titanic literature on every topic that’s been raised here, literature that precedes Popper, came after Popper, and has gone far beyond Popper in its appreciation of context and uncertainty in soft sciences. Mounting application complexities continue to raise deep problems far beyond the understanding and capabilities of most researchers and many philosophers of science. Indeed, it is starting to dawn on some that many problems may now be beyond human understanding, which should be no surprise given the extent of modern social systems. In that case, the posturing and pretenses to complete systems for scientific conduct or rationality (whether frequentism, falsificationism, deductivism, Bayesianism, or something beyond them all) may be even more destructive than we heretofore imagined.

It is thus saddening to see how some academics fall prey to monomaniacal ideologies, and devote themselves to what must be the inevitably narrow views of one writer or philosophy or methodology, perhaps with some tweeks of their own. They seek to dismiss without comprehension hundreds of worthy readings on problems that are controversial (and in many cases fundamentally unresolvable), foregoing the richness of the universe of insights beyond their microworld. Such behavior recalls the religious fanatics who attempt to ridicule, harass and terrorize others into obedience to their scriptures and idols. Well, in the end we academics are human, all too human; perhaps we are even prone to subtle fanaticism and rhetorical deception because we can more easily get away with it, having been bestowed with the paper authority of credentials and the protections of academic freedom.

4 Likes

You don’t say.

I explained it in the bit you just quoted. The statement is true if certain well-established usages are assumed. It doesn’t get more explicitly non-absolute than when a conditional must be met in order to even be able to evaluate the truth status of a statement.