Significance tests, p-values, and falsificationism

Another word on induction:

It is, I think obviously, not the narrow definitions which can lead to absurd statements, because it is precisely within the “narrow” definition (if you want to call it that) that no absurdities appear—because there is no question that induction is logically invalid. (Unless some of you can think of some new and interesting arguments, which is always possible.)

Absurdities do appear, though, once you (carelessly) expand the definition of ‘induction’ so it applies to basically anything: a guess, a hunch, an extrapolation, anything that involves a reference to empirical evidence, anything that involves a reference to learning and so on and on. I have quoted Popper on this before: you can call anything you want ‘induction’ as long as you’re not misled—especially into believing that you’re talking about anything that resembles valid logic. Which is why he carefully defined how he used the term. And he never pretended to mean anything else by it either. So the idea that he (or I, for that matter) might have been attacking a strawman is … let’s say: seriously misguided.

1 Like

And a word on one of the examples of science in practice:

Starting with the first sentence, “There is no such shared value in practice”, I guess I’ll have to repeat something that has been said many times before: Even if that were true, Popper’s methodology is normative, not descriptive. Any criticism along the lines of “But many people don’t do that” is, taken on its own, of no consequence to an appraisal of that methodology.

Having said that, Popper has gone to some lengths, in Realism and the Aim of Science, to show that his methodology is additionally also in good accord with quite a few prominent examples of science in practice: he lists twenty such cases, if I’m not mistaken.

As to the Higgs Boson story: it is also neither here nor there whether scientist X is secretly (or not so secretly) rooting for “their” theory; the critical attitude isn’t expressed in this or that initial preference but in whether or not you are prepared to revise your ideas in the face of contradictions.

Then, there is a very instructive misunderstanding when you talk about how the LHC people wanted to “claim discovery”. Of course they did. Nobody has ever doubted that. The whole point is that they didn’t, as many people in the social sciences are wont to, claim discovery based on a single p-value. And that is for two reasons: 1. There were two completely independent experiments at the LHC—extremely expensive experiments. One result was obviously not deemed sufficient as a matter of principle—a principle worth billions of pounds. 2. The results, however many “sigmas” they showed, would have meant nothing (in particular) if they hadn’t been predicted to exquisite precision using a unifying explanatory theory. To present this story as if the discovery was made on the basis of a very low p-value alone would be to completely misunderstand what’s going on.

And that gives me an opportunity to refer to a blog post by Denny Boorsboom, in which he (to some degree) acknowledges precisely this problem: that psychology is in trouble (and, by extension, the social sciences in crisis) because there is “no theory”.

That, I submit, is the key to the whole problem: if you ignore that only tests of predictions that are born of some theory have any scientific meaning, then the crisis will never end. What you’re doing then is, in effect, Cargo Cult Science.

1 Like

The following comments are offered with no intent of finality or authority, but rather to (1) improve the accuracy of portrayal about what actually happened in the famous LHC Higgs experiments, and (2) point to questions the statistical set-up raised for those who use or defend use of P-values (such as myself) and those who use or defend NHST (not myself). I do not believe the realities of the experiments and the science are anywhere near as clear cut about this as some philosophers have made it sound.

That portrayal problem is characteristic of the misleadingly oversimplified descriptions of real scientific activities and results I see in much (not all) of the philosophy of science literature, especially in heroic accounts of bold conjectures and experiments in which the latter turn out to be a lot more muddy than presented (perhaps ‘cargo-cult philosophy of science’ would be a suitable label for that practice). The 1919 Eddington eclipse expedition test of general relativity is an example that seems obvious by today’s standards Trust in expert testimony: Eddington's 1919 eclipse expedition and the British response to general relativity - ScienceDirect ; from what I’ve read the LHC experiments provide a more subtle instructive illustration for the present controversies.

As I suspect no one here is a professional particle physicist or even close, I warn that my comments may contain inaccuracies about the LHC physics. But hopefully they correct some worse inaccuracies relevant to the present discussion. For a more extensive discussion of statistical issues in the Higgs/LHC experiments by a physicist see eg the article by Cousins (who is a UCLA member of the CMS group) in this special issue of Synthese Vol. 194, No. 2, February 2017 of Synthese on JSTOR

Part 1 - corrections and refinements. Some philosophers and popular accounts have (as above) technically misrepresented the Higgs/LHC experiments in ways which obscure some interesting issues. First, unfortunately the experiments were not completely independent: While they used different detectors, both of necessity had to use as their generating equipment the LHC. Any undetected defect in that equipment or in the calibration and programming of its subsidiary components capable of reliably producing a ~125 Gev signal that was non-Higgs would have affected both experiments. Having independent teams and detectors did ensure many (perhaps all the worrisome) sources of dependence were absence, but did not exclude every imaginable possibility. That they announced discovery anyway reflects their confidence that no such non-Higgs artefact was present; but that is an auxiliary assumption.

Second, the results were not predicted to exquisite precision; they were read off to rather modest precision (again, for Standard Model particle physics, which has gone beyond a dozen significant digits predictive accuracy at other points). They have been refined by repeated measurements since the landmark 2012 results, but are still only at about 0.1% accuracy (which is unheard-of for “soft” sciences but not so dramatic in particle physics).

Those are just the measurements though; from what I read the predictions were not even that accurate. Before the 2012 results there was quite a bit of uncertainty about the mass of the Higgs particle, as the standard-model predictions themselves depended on measurements whose uncertainties had to be propagated. This was a reason for two independent detectors: to get mass readings out of the LHC by two physically distinct means (ATLAS and CMS).
You can read about the realities at their own website: CMS measures Higgs boson’s mass with unprecedented precision.

Third, as a very small but essential detail in reading their results accurately, their statistical test results were presented as sigmas (Z-scores) corresponding to 1-tailed P-values. This is a mere transform, but the results were at times misreported in popular accounts as if the sigmas or subsidiary P-values were directly taken from a 2-sided test (as commonly done in “soft” sciences).

Part 2 - questions (not independent of part 1): The hypothesis H the experiments statistically tested is often presented as “Standard Model Without Higgs”, with a remark that apparently no one believed this H could be correct. That raises the question of why this H was tested, and especially if its use represented anything other than the habit of using a null. One mundane answer is that H was conveniently precise, whereas the alternative was fuzzy or full of uncertainty (as often the case in “soft” sciences). Was then the null H just a heuristic default?

To go deeper into the issue of alternative H and their detection, it’s been said we need to ask about the counterfactuals (which were potential outcomes before the experiments): Suppose “nothing had been found” as happened with some other theoretical particles subjected to LHC detection efforts (detection failures which for some reason were not played up in the popular press but fortunately were published in physics journals). We’d then have to ask what that “failure” meant, which is an infinite variety of possibilities ranging from (say) two highly compatible measurements but only 3 sigma to two vastly incompatible measurements with one 2 and the other 5 sigma to both within a sigma or two of H. Among the possibilities I’d seen mentioned for the “most null” potential outcomes were that the Higgs existed but was beyond the detection limit of the experiments, and that the Higgs did not exist after all and the Standard Model (despite its successes) simply could not explain why certain particles have mass, just like it cannot explain some other phenomena (like neutrino oscillation and baryon asymmetry). My bet is that had they found the experiments in conflict they would have done well to examine the P-value function (or sigma graph) for the difference in experiments to aid in screening explanations for the difference, along with of course searching for mechanical problems in one or both experiments.

2 Likes

One common criticism of Popper is that he uses seemingly ordinary terms in a misleading technical fashion. That’s what gives some plausibility to his extreme skepticism. If I say I “verified” a statement, others will think I confirmed that the statement is true or very probably true. But that cannot be what Popper means with that term. He thinks (correctly, in my view) that we can’t have 100% certainty in any empirical statement, he’s a fallibilist. But he also doesn’t think we are ever justified in thinking empirical statements are “probably” true, for he he rejects induction. So what could he possibly mean with “verification”?

For Popper, the scientific community “verifies” a basic statement by deciding to accept it (as noted in the quote you posted). He doesn’t think this decision is entirely arbitrary: there should be certain methodological criteria such as intersubjective testability (as I mentioned before). But this “decision” is not a reason to think the statement is true or probably true. It’s a convention, which is why Thornton calls Popper’s philosophy a “sophisticated form of conventionalism”.

Popper is very clear that “verification” is just a revisable decision, not a demonstration that something is true (as we’d ordinarily use that term). The page numbers are from The Logic of Scientific Discovery 2002:

Experiences can motivate a decision, and hence an acceptance or a rejection of a statement, but a basic statement cannot be justified by them—no more than by thumping the table. (87-88)

From a logical point of view, the testing of a theory depends upon basic statements whose acceptance or rejection, in its turn, depends upon our decisions. Thus it is decisions which settle the fate of theories. To this extent my answer to the question, ‘how do we select a theory?’
resembles that given by the conventionalist; and like him I say that this choice is in part determined by considerations of utility. But in spite of this, there is a vast difference between my views and his. For I hold that what characterizes the empirical method is just this: that the convention or decision does not immediately determine our acceptance of universal statements but that, on the contrary, it enters into our acceptance of the singular statements— that is, the basic statements. (91)

By its decision, the jury accepts, by agreement, a statement about a factual occurrence—a basic statement, as it were. (92)

The fact that the scientific community has accepted the basic statement does not mean the statement is true. The decision can be revised in light of new experience:

[T]he statement need not be true merely because the jury has accepted it. This (…) is acknowledged in the rule allowing a verdict to be quashed or revised. (92)

I differ from the positivist in holding that basic statements are not justifiable by our immediate experiences, but are, from the logical point of view, accepted by an act, by a free decision. (92)

The empirical basis of objective science has thus nothing ‘absolute’ about it. Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or ‘given’ base; and if we stop driving the piles deeper, it is not because we have reached firm ground. We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being. (93-94; emphasis mine)

This last quote would be entirely reasonable if it was just an expression of humility and fallibility. But it’s not just that: every scientific theory is, for Popper, based on ungrounded conjectures with no confirming evidence. We choose to accept these ungrounded conjectures, we don’t base them on evidence (even if these decisions are not arbitrary).

It follows that, just as scientific theories cannot be shown to be true, they cannot be shown to be false. We just accept to reject them because we decided to accept some basic statements. Both decisions can be revised and we have no reason to think they are correct. If that sounds unreasonable, it’s because it is.

RE: Carnap’s counter-example to the notion of (complete) falsifiability.

I thought it was clear from the portion of the quote I placed in bold, but I will use Carnap’s own writing in Testing and Meaning. In brief, he makes a distinction between dichotomous verifiability and continuous confirmation, and finds that the confirmation of universal statement is symmetric to an existential one.

Blockquote
Now a little reflection will lead us to the result that there is no fundamental difference between the universal sentence and a particular sentence with regard to verifiability, only degree …The number of such predictions from a sentence is infinite; and therefore the sentence can not be completely verified. [my emphasis] Therefore here no complete verification is possible but only increasing confirmation. We may, if we wish, call a sentence disconfirmed if the degree to which its negation is confirmed in that degree.

You can see this in the graph of the p-value function (aka the “confidence distribution”) of a data set. as Sander has discussed, the -log_2(p) is known as the Shannon unit of information, and can be considered how many bits of information the sample statistic deviates from a hypothetical parameter value. This can be calculated for a set of possible parameters. The maximum likelihood estimate would have a p of 1 (or a Shannon information of 0) indicating compatibility of the data with the hypothesized parameter.

Fisher’s methods are often taken (naively) as a realization of Popper’s philosophy, when that is just not so.

1 Like

Yup, just like one common criticism of Darwin was that he used the seemingly ordinary term “species” in a misleading technical fashion. Oh, and of course it isn’t actually misleading because it’s properly defined. That’s science for you. :slight_smile:

Except Popper has never displayed “extreme skepticism”. Unless you mean towards induction, in which case you’d be right, as would he.

And what some unspecified, possibly uncritical, others will think is neither here nor there. What they should think depends entirely on how the term is defined. As in any other discipline.

You think? Seeing as Popper explicitly says that that isn’t what he means and equally explicitly says what he does mean, that’s not a complete surprise.

What do you mean, “but”? He indeed thinks that there is more to it than “just” a decision, as you misleadingly say in the following paragraph. Of course it’s a decision in the end, because (as per our assumptions) there is no such thing as certain truth; but it’s a decision that is based on critical reasoning, in other words: tests. And you’re equivocating on the word “true” when you say, “ But this ‘decision’ is not a reason to think the statement is true”, because that’s not what Popper says—and anyway, the decision is not a “reason” for anything; he says we assume the truth of the statement for the purposes of the argument in hand if and when it is well tested.

If one were to choose—disingenuously, I would say at this stage—to ignore Popper’s explicit and entirely reasonable definition, then yes. If you cling to the idea that “truth” must mean certain truth, then you will never understand what Popper is saying and will, indeed, find what he says absurd.

This, by the way, is the same misguided complaint that Kuhn made when he asked, “What is falsification if it is not conclusive disproof?” He failed to understand literally the first thing about Popper’s philosophy (i.e., fallibilism).

Since what you placed in bold wasn’t actually a quote, I’m afraid it wasn’t clear. :slight_smile:

As to what Carnap says there, please see my reply to pedro just above. Carnap is talking about “absolute verification”; that completely misses Popper’s point. So I’d say it wouldn’t be unusual if Popper never felt he should reply to it.

In any case, since Carnap says there that there is “no fundamental difference between a universal sentence and a particular sentence with regard to verifiability”, would you think that amounts to saying that the logical relationships in verifiability and falsifiability show symmetry, in fact that universal and existential statements are constructed symmetrically?

1 Like

Blockquote
in fact that universal and existential statements are constructed symmetrically?

In order to formally prove (using the proof by contradiction technique), a universally quantified statement, you would start by assuming the negation is true, which makes the initial assumption an existentially quantified one, and working until a contradiction is obtained. I can’t think of a more obvious symmetry than that.

The issue that Carnap and others had with Popper is that his philosophy doesn’t describe any practical science, particularly the discovery of regularities that involve limits of variables. I think practical people have moved on from “falsification” and look at things in terms of decidability, which privileges neither negation nor confirmation.

1 Like

So why did Popper never see that, do you think?

That is my question to the Popper proponents.

If I may ask: Which of Popper’s works have you read in order to possibly find an explanation for that?

It should not matter, but assume I am familiar with Logic of Scientific Discovery. A simple point of logic should not require extensive citations from numerous texts. Your absence of a response that addresses a direct question is curious.

+1 for Carnap’s original words. While I’m here, I might as well share something inflammatory …

1 Like

But no +1 for Popper’s original words? I’m offended…

1 Like

Would you put Platt’s (1964) Strong Inference piece into this category? (To me, it seems intensely practical in its orientation.)

It certainly is curious, but I must insist that it does matter. Because you’re not actually that familiar with LoSD.

You see, the thing you said was equivalent to what Carnap said (which you claimed was “fatal” for Popper) and that you would like to have explained why Popper never saw it—that’s a direct quote from LoSD.

Let me add, though, that while I do find it a little bit funny, I said all that in the hope that you (and maybe one or two others) would perhaps start to seriously consider the possibility that you know Popper and his ideas rather less well than you think.

Luke 15:32

I am not familiar with Platt, but I find much merit to the Hintikka game semantic model of logic, where quantifiers of logical statements are interpreted as different players in a game. If we extend propositional logic to probability logic, that can be viewed as a gambling game as in Shafers’ Testing by Betting.

A question is decidable if there exists a proof of A or not A. I’d prefer a constructive proof, but I’d accept a classical proof that uses reductio ad absurdum too.

I’m tired of replying, so I’ll try to be brief:

  • Popper can define his terms in however way he wants. What he cannot do is to equivocate his terms: use a word in his technical specialized sense at certain times and then fall back to the common definition when it’s convenient. Let me make a comparison. Suppose I say that Popper is wrong because universal generalizations can be confirmed. You ask, “how so?”. I answer, “well, to confirm a universal generalization is just to make a decision that it’s true based on pragmatic considerations x, y, z”. Sure, if that’s what I mean, then we can “confirm” theories. But then it’s misleading for me to say that Popper is wrong because of that.
  • Popper does not have a monopoly on fallibilism. Virtually everyone agrees that fallibilism is true. The problem is that, by rejecting induction, he thinks fallibilism can only mean “we can never justifiably think anything (even a basic statement) is probably true or probably reliabe”, which is madness. Suppose I see a black swan. Normally, we’d say that we have evidence against the hypothesis that all swans are white. Of course, I can be wrong because I’m fallible. Perhaps I’m hallucinating; perhaps I didn’t really see a swan, but some other animal; etc. But still, that’s some evidence against the hypothesis, I have good reasons to believe the generalization isn’t true. If others see the swan, the evidence is even greater. But Popper cannot even say this. The scientific community accepts the statement after certain tests and then we hope for the best, the basic statement is still conjectural and ungrounded.
1 Like

Which he doesn’t. As I explained above, including a crucially relevant quote. You could engage with that and spend your time and energy less tiresomely, you know? :slight_smile:

The problem is that you misunderstand all of that. His rejection of induction has nothing to with his rejection of justificationism, which founders on its own illogic. Is also has nothing to do with his rejection of the possibility of gaining certain knowledge. And you misunderstand that because you ignore his definitions—again.