Necessary/recommended level of theory for developing statistical intuition

Knight (and possibly Keynes) viewed insurance as one area that may be manageable using a probability framework, according to Faulkner et al. (2021) from whom I quote:

“Knight argues that the first type of probability [resolvable uncertainty in the K^2 parlance] ‘practically never’ (Knight, 1921, p. 225) arises in business but allows that there are situations in which statistical probabilities can sometimes be estimated. These are the cases in which it is then possible to achieve a kind of certainty by grouping cases: ‘an uncertainty which can by any method be reduced to an objective, quantitatively determinate probability, can be reduced to complete certainty by grouping cases’. The obvious example here is insurance, which works by ‘dealing with groups of cases instead of individual cases’ and allows risks to be transferred to the insurer in a way that leaves both the insured and the insurer better off.”

Faulkner et al, 2021: F. H. Knight’s Risk, Uncertainty, and Profit and J. M. Keynes’ Treatise on Probability after 100 years. Cambridge Journal of Economics, 45: 857-882. (This is the introduction to a special issue celebrating the centenary of the Knight and Keynes books discussed above.)

I have Savage’s book. K^2 borrow the “small world” and “large world” (Savage used “grand world”) terms from Savage, but put their own spin on it. And in contrast, K^2 provide blistering criticism of Savage’s subjective expected utility model in economics. (K^2 do not just attack statisics but much of economics as well.)

I am not familiar with Jaynes’ book, but as a physicist, I have an awarness of (and perhaps some sympathy to) his maximum entropy framework, which seems worthy of consideration. However I do not ultimately agree with him that probability should be “the” logic of science. At best, it is one of several frameworks that can be useful (and can also be misleading) in the scientific enterprise. There are simply too many successful examples from the history of physics itself where probability plays no role in scientific discovery and the interpretation of data.

I’ve read several reviews of Clayton but not the book itself. Criticisms of frequentism are entertaining and welcome (the Significance piece I coauthored with B. Gunter, linked above, is yet another anti-frequentist screed) but the solution is not Bayesianism or Likelihoodism. Both of the latter are founded on the Likelihood Principle, which requires (just as frequentism does) pre-specification of the statistical model (as Diaconis points out in his ‘magical thinking’ article). However, in exploratory work we need the ability to iteratively improve the fit of the model to the data, which invalidates the nominal probability properties of any resulting inference (whatever school of thought you use).

What’s common among all 3 books is that they do not attempt to move probability away from the heart of statistics. I’m trying to argue that most of the time we’re committing the ludic fallacy and we should take these stochastic models much less seriously. They are useful as tools to explore and summarize data, but not for drawing conclusions and making decisions, except in a very narrow set of conditions.

Thank you for starting this very stimulating thread.

3 Likes

I can attest to the greatness of books by Jaynes, Ross (Probability Models), Clayton.

1 Like

I enjoyed your American Statistician Article much more than expected. I’d especially recommend reading this citation

Nelder, J.A. (1986). Statistics, Science and Technology. Journal of the Royal Statistical Society: Series A (General), 149: 109-121. Statistics, Science and Technology on JSTOR

Blockquote
There are simply too many successful examples from the history of physics itself where probability plays no role in scientific discovery and the interpretation of data.

I recall in one of Richard Feynman’s Lectures on Physics, he pointed out that pushing the frontiers of physics was truly hard, because anything successful historically is tried first. That suggests to me not to draw too many conclusions from the early history of physics, but I might be mistaken about that.

Blockquote
…but the solution is not Bayesianism or Likelihoodism. Both of the latter are founded on the Likelihood Principle, which requires (just as frequentism does) pre-specification of the statistical model (as Diaconis points out in his ‘magical thinking’ article). However, in exploratory work we need the ability to iteratively improve the fit of the model to the data, which invalidates the nominal probability properties of any resulting inference (whatever school of thought you use).

This is a very subtle point that, despite having watched Feynman’s lectures on physics, as well as a number of his writings on scientific methods in physics, I can appreciate in much more deeply having read your article.

I think this point (which physical scientists seem to take for granted) that Feynman was attempting to express in his famous speech Cargo Cult Science.

The only other place where I saw it explicitly stated (but again missed out on the subtle implications) was in the classic Multiple Comparison Procedures by Hochberg and Tamhane. They emphasized that unless error probabilities are asserted before the data, no error rates are defined; and by implication, no information is communicated.

There is a completely different way to look at this. Iterative model improvements should carry along uncertainties. If you for example forget that you allowed the data to be non-Gaussian at one point, the Bayesian uncertainty intervals will be too wide.

1 Like

Good that you brought up Feynman - he explicitly dismissed what we now call HARKing (hypothesizing after results are known) in his book The Meaning of it All, as discussed in Sec. 8 of Gerd Gigerenzer’s classic paper “Mindless Statistics”

Having said that, Transparently HARKing is a hallmark of good science because it leads you to do new research: https://journals.sagepub.com/doi/full/10.1177/0149206316679487

Regarding history of physics, see the article by Seife (2000) I cited in an earlier post on this thread, for many examples in the recent history of physics where “statistically significant” results “crumbled to dust” as the author described, due to the later discovery of systematic errors. That paper is 20 years old, and a number of additional examples of the use of statistical methods leading astray have arisen since then, such as the controversy over the proton mass, described very nicely in this piece from Physics World.

2 Likes

I might add that the recent imaging of Black Holes that have made the front pages of the newspapers, is an example where the physicists were explicitly anti-Bayeisan - they went to extraordinary lengths not to let their prior expectations influence the analysis; this lent greater credibility to the final result.

1 Like

In retrospect, the last two articles I posted, the proton mass article and the black hole article, are not about statistics but about groupthink. In the proton mass case, groupthink led to unwise data exclusion. The black hole article shows that there are formal procedures one can take to manage and mitigate, if not eliminate, the influence of groupthink, in that case from prior expectations or knowledge. John Tukey (1972) observed ”[T]he discovery of the irrelevance of past knowledge to the data before us can be one of the great triumphs of science.” (Q. of Applied Math, 30: 51-65). In the black hole case, the prior expectations turned out to be “right”, despite the efforts to prevent it from biasing the analysis; in the proton mass case, the prior knowledge is looking very shaky.

While Bayesian priors are more obviously subject to the effects of groupthink, even frequentist analysis can be undermined by it (eg, researcher degrees of freedom, forking paths, and all that). Statisticians have tools to help manage group think - mainly on the data production side (eg, randomization, blinding, concurrent control all help prevent the investigators from fooling themselves) but on the analysis side, perhaps our statistical intuition is underdeveloped? Wagenmakers et al’s proposal earlier this week, to have many independent stat analysis teams for the same data set, mimics one of the procedures used by the black hole team.

https://www.nature.com/articles/d41586-022-01332-8

1 Like

Blockquote
The black hole article shows that there are formal procedures one can take to manage and mitigate, if not eliminate, the influence of groupthink, in that case from prior expectations or knowledge.

While I have to study those examples more closely, I think the fundamental issue is the appropriate language for model uncertainty (ie. probabilities about models that output probabilities)

For a Bayesian, it is probabilities all the way down, as Frank’s post shows. Philosopher Richard Jeffreys also held that view.

Others, like Arthur Dempster and Glen Shafer, are not convinced that all uncertainty can be expressed by a single probability number. Their work lead to Dempster-Shafer theory, also known as the “Mathematical theory of Evidence” and is closely related to the notion of “imprecise” (aka interval) probabilities.

The work in this area is more likely to be published in symbolic logic journals than applied stats, but the tools developed there are likely (IMO) to productively resolve these philosophical disputes and lead to rigorous “statistical thinking” vs. the too common statistical rituals now at epidemic proportions.

Here are some interesting papers for the philosophically inclined. I’d start with the first one, and then the others for more formal development and justification.

  1. Crane, H. (2018) Imprecise probabilities as a semantics for intuitive probabilistic reasoning (link)
  2. Crane, H; Isaac, W (2018) Logic of Typicality (link)
  3. Crane, H. (2018) Logic of Probability and Conjecture (link)

For applications the following are interesting; 2 is a mathematical formalization of the argument made by @Sander in numerous threads about Bayesian probabilities being too optimistic, to accept the idea that these models capture all uncertainty. Martin turns the absence of any truly “non-informative” prior to prove any additive system of representing beliefs runs the risk of false confidence. Formal discussion at the meta-level (ie. model criticism) can be productively done in the realm of non–additive beliefs.

  1. Martin, R (2021) An imprecise-probabilistic characterization of frequentist statistical inference (link)
  2. Martin, R (2019) False confidence, non-additive beliefs, and valid statistical inference (link)
  3. Martin, R (2021) Valid and efficient imprecise-probabilistic inference across a spectrum of partial prior information (link)
  4. Balch, M, Martin R. Scott, F. (2019) Satellite conjunction analysis and the false confidence theorem (link)

Christian P. Robert gives his opinion here:

1 Like

Thank you for these references, especially those to Crane (including from your earlier post) and the false confidence papers. I was not aware of this work, and I find I am sympathetic to much of Crane’s thinking (though I deviate sharply on a few points - I will need to start reading more of his corpus). Phenomena analagous to “probability dilution” is fairly routine in my work, and like Kay and King, and possibly Crane, I do not agree that it is the duty of statistics/computation/math to “solve” such problems by finding ways to quantify uncertainty, even if imprecisely. Neither my personal experience working with scientists for 20+years, nor my reading of the history of science, are consistent with Martin’s assertion, “I contend that scientists seek to convert their data, posited statistical model, etc., into calibrated degrees of belief about quantities of interest.” No scientist I’ve ever worked with has even behaved in such a manner, much less directly asked for this from me.

Perhaps because I’ve worked on the applied side of science, instead my collaborators are interested in building an evidence base for making decisions, including experimentally evaluating alternatives, under limited resource constraints. My role has been to help them design studies, and describe/summarize and interpret the resulting data. I rarely find it either warranted, necessary, or useful to include “probability as uncertainty” claims (including statistical inferences of any flavor) in advancing that goal, except in the narrow situations I allude to in my 2019 paper, which form a minority of what crosses my desk.

1 Like

After a study of the Carmichael and Williams exposition, and their references to the Fieller theorem and Gleser-Hwang theorem, the False Confidence result is less surprising.

Liseo, Brunero. (2003). Bayesian and conditional frequentist analyses of the Fieller’s problem. A critical review. Metron - International Journal of Statistics. LXI. 133-150. (link)

After showing the limitations of frequentist and non-informative Bayesian methods, the author

Blockquote
…adopts a robust Bayesian approach to show that it is nearly impossible to end up with a reasonable solution to the problem without introducing some prior information on the parameters

Upon reflection, I think the philosophical claims are too strong. I see no reason why the notion of an imprecise probability can be embedded in a broader Bayesian framework. A decision theoretic approach to the design of experiments implies uncertainty about the “true” probability distribution, and hence an imprecise probability, and imprecise probabilities are one way of conducting a robustness analysis of Bayesian methods. Still, the paper was very interesting.

To bring this discussion back to the original point about statistical intuition, Crane’s comment, in his “Naive Probabilism” article that you posted earlier, is crisp and emphatic:

“Probability calculations are very precise, and for that reason alone they are also of very limited use. Outside of gambling, financial applications, and some physical and engineering problems – and even these are limited – mathematical probability is of little direct use for reasoning with uncertainty.”

I think Kay & King (2020) are basically making the same point in a more long-winded (but more compelling) way. In my view, an important part of statistical intuition is the quality of judgment about whether a probability calculation could be both valid and useful. Statisticians tend to overdo it; frankly so do many physicists, perhaps because physics has so many cases where probability calculations are valid and useful. In his 1995 autobiography, Gen. Colin Powell wrote that “experts often have more data than judgment” and he was right.

Could not disagree more strongly with “little direct use”.

1 Like

Crane (and Martin’s) full position is elaborated in the 2018 preprint Is Statistics Meetting the Needs of Science? (See my post above for link.)

Blockquote
To be clear, our lack of conviction about whether statistics is meeting the challenge ought not be construed as skepticism about whether it can meet the challenge [of modern science].

After re-reading his critique on a new p-value threshold (mentioned here), I came to understand, and ultimately agree with your position that estimation should be emphasized over testing. Tests are too easy to misuse for the information they provide.

Ultimately, I think everyone would learn to appreciate it if authors were encouraged to publish using p-value curves and “confidence” distributions (ie. the set of all interval estimates 1-\alpha with \alpha ranging 0 \lt \alpha \lt 1 as statistical summaries, regardless of whether they later use Bayesian or likelihood methods for analysis.
.

IMHO statistics took a wrong turn when it began favoring inference over decision making. The two are very different, and the latter is relevant.

3 Likes

My 2019 paper attempted to make a similar case that Statistics could offer more to science than it currently does with its exaggerated focus on inference. However I was too optimistic. Much of what constitutes ‘statistical thinking’ is probabilistic thinking, which has its place for sure, but a much smaller one than statisticians demand. I’m not saying to get rid of probability, just that it must take its place as merely one of several frameworks with which to understand the use of data in the scientific enterprise, and shouldn’t have a monopoly on it.

I’d like to attempt a summary of the “divergence” from the original post (which I actually had nudged along) and then redirect the discussion with a quote that, in part, captures the kind of thing I was attempting to convey in starting this topic. However, I am not opposed to going back to the current discussion (I have enjoyed them greatly and have a long reading list thanks to the posters). I was just hoping to satisfy my initial question, even if the only “real” answer is along the lines of “go back and get a graduate degree in math and/or statistics”.

Summary

It seems that there is controversy in the prob/stats world regarding even a “simple” question such as “what is probability?”, which strikes me as a philosophical question at heart; see for example this forum topic. I don’t know whether the arguments between Frequentism, Bayesianism, and Likelihoodism all essentially stem from different “views” regarding how one should define “probability”, or whether there are purely methodological differences, but it seems to be academic to me (at times even dogmatic). There are even disagreements within “camps” (for example, see the topic @R_cubed linked to on whether P-values should be abolished or the “threshold” changed).

Coincidentally, there are several reviews of Deborah Mayo’s book (Statistical Inference as Severe Testing) on Gelman’s blog (also typed up in LaTeX and posted to arXiv if you prefer that format). In one of her responses she states that

The disagreements often grow out of hidden assumptions about the nature of scientific inference and the roles of probability in inference. Many of these are philosophical.

[I can’t comment on whether this is a true representation of the field, nor did I mention her book in support of her views as I only just learned of its existence.]

I think I agree with @f2harrell’s sentiment in his post on the topic on this forum: it is all interesting intellectually but not very useful in the real world. I am, ultimately, interested in understanding more and some excellent references have been shared already; I intend to read much of what was shared and appreciate everyone’s contributions. But beyond the intrinsic reward of accumulating knowledge and intellectual discussions, I don’t have much use for philosophy currently.

In any case, I still believe that a strong foundation in math (and to some extent theory) is required to properly appreciate such issues, even if it all boils down to differences in “philosophy”. I currently don’t have that foundation; think the Dunning-Kruger effect (I don’t even know all that I don’t know about the subject).

Back to Intuition

I recently came across an interesting historical comment by the late Sir David R. Cox recounting several “pioneers in modern statistical theory”. The quote that stood out to me was about John Tukey (Section 13 of the article):

Some 20 or more years later, he unexpectedly came to see me at home in London one Sunday. What was I working on? I told him; it was something that he was highly unlikely to have known about. After a few moments of thought he made ten suggestions. Six I had already considered, two would clearly not work, and the other two were strong ideas which had not occurred to me even after long thought on the topic. This small incident illustrates one of his many strengths: the ability to comment searchingly and swiftly on a very wide range of issues.

Although Sir David gives no indication as to what he was working on, the passage highlights what is, to me, the “ultimate goal”. That is, the ability to give a thoughtful (perhaps critical) appraisal of a problem based only on the basics and a bit of thought (and probably some questions about the problem/data/goal). Is this kind of ability even achievable for a “non-Tukey” (who, by all accounts I’ve seen, was regarded a brilliant man)? Or does it require years of experience, supported by a strong foundation in mathematics?

P.S. Gelman’s blog

The post on Gelman’s blog linked to above links to some papers that I haven’t had a chance to read yet but appear to be extremely interesting. There is an additional review by Christian Robert at another blog.

The first paper is a “monograph” by Robert Cousins. It looks quite interesting based on my brief reading of a few of the sections.

Another is a paper by Gelman and Christian Hennig (including extensive discussion after the article): Gelman & Hennig (2017). Beyond subjective and objective in statistics. J R Statist Soc A. I haven’t had a chance to go through it all yet.

He also links to another blog post of his discussing his article with Cosma Shalizi (Philosophy and the practice of Bayesian statistics) (along with responses).

P.P.S.

I also stumbled upon these lecture notes by Adam Caulton that look interesting. (I actually found when I was googling “savage ramsey finetti jeffreys jaynes:grin:)

Apologies for how long this ended up!

2 Likes

Richard Hamming, himself an important figure in applied math/computer science, was humbled by Tukey’s knowledge and productivity. He said:

“I worked for ten years with John Tukey at Bell Labs. He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode’s office and said, ‘How can anybody my age know as much as John Tukey does?’ He leaned back in his chair, put his hands behind his head, grinned slightly, and said, ‘You would be surprised Hamming, how much you would know if you worked as hard as he did that many years.’ I simply slunk out of the office!”

Source:

4 Likes

@ChristopherTong I don’t know how I failed to remember Richard Hamming, as he wrote a book in line with the theme of this thread: The Art of Probability, which is much closer to the information theoretic perspective I find helpful

Philosophy of science and the foundations of statistics can be good in small doses. The philosophical debates over the nature of “probability” don’t really matter in the context of data analysis.

In the context of messy, ambiguous data analysis, @Sander is always worth reading. In these articles he demonstrates how to use frequentist software to conduct an approximate Bayesian analysis.

3 Likes

Thanks for sharing, I greatly enjoyed his talk! It is fortunate that it was recorded.

While I suspected that hard work, perseverance, diligence, etc. is a major aspect, I do believe there are some “true geniuses” (and know a few people who I consider such, incidentally physicists) who seem to possess some kind of natural ability enabling them to more easily understand complex/abstract topics. Hamming actually mentions Feynman, in the Q&A, in regards to this; that he knew he would win a Nobel for something. See also Oppenheimer’s recommendation letter for Feynman for Berkeley:

Of these there is one who is in every way so outstanding and so clearly recognized as such, that I think it appropriate to call his name to your attention, with the urgent request that you consider him for a position in the department at the earliest time that it is possible. You may remember the name because he once applied for a fellowship in Berkeley: it is Richard Feynman. He is by all odds the most brilliant young physicist here, and everyone knows this.

I may give you two quotations of men with whom he has worked. Bethe has said that he would rather lose any two other men than Feynman from this present job, and Wigner said, “He is a second Dirac, only this time human.”

Or, for a historical example in the “arts”, there are those “prodigies” such as Mozart; while he was privileged and by all accounts an incredibly hard worker, there seemed to be something “special” about him. In creative areas, others similarly describe “getting” or “receiving” their “content” (Tolkien with invented languages, and current young prodigy Alma Deutscher when discussing melodies). Then again, I am a big fan of Mozart to begin with!

But I digress; Hamming gives good advice even if one’s goal is not to do “Nobel quality work”, and even (or especially) if one is not a “prodigy”.