Necessary/recommended level of theory for developing statistical intuition

I’d like to attempt a summary of the “divergence” from the original post (which I actually had nudged along) and then redirect the discussion with a quote that, in part, captures the kind of thing I was attempting to convey in starting this topic. However, I am not opposed to going back to the current discussion (I have enjoyed them greatly and have a long reading list thanks to the posters). I was just hoping to satisfy my initial question, even if the only “real” answer is along the lines of “go back and get a graduate degree in math and/or statistics”.

Summary

It seems that there is controversy in the prob/stats world regarding even a “simple” question such as “what is probability?”, which strikes me as a philosophical question at heart; see for example this forum topic. I don’t know whether the arguments between Frequentism, Bayesianism, and Likelihoodism all essentially stem from different “views” regarding how one should define “probability”, or whether there are purely methodological differences, but it seems to be academic to me (at times even dogmatic). There are even disagreements within “camps” (for example, see the topic @R_cubed linked to on whether P-values should be abolished or the “threshold” changed).

Coincidentally, there are several reviews of Deborah Mayo’s book (Statistical Inference as Severe Testing) on Gelman’s blog (also typed up in LaTeX and posted to arXiv if you prefer that format). In one of her responses she states that

The disagreements often grow out of hidden assumptions about the nature of scientific inference and the roles of probability in inference. Many of these are philosophical.

[I can’t comment on whether this is a true representation of the field, nor did I mention her book in support of her views as I only just learned of its existence.]

I think I agree with @f2harrell’s sentiment in his post on the topic on this forum: it is all interesting intellectually but not very useful in the real world. I am, ultimately, interested in understanding more and some excellent references have been shared already; I intend to read much of what was shared and appreciate everyone’s contributions. But beyond the intrinsic reward of accumulating knowledge and intellectual discussions, I don’t have much use for philosophy currently.

In any case, I still believe that a strong foundation in math (and to some extent theory) is required to properly appreciate such issues, even if it all boils down to differences in “philosophy”. I currently don’t have that foundation; think the Dunning-Kruger effect (I don’t even know all that I don’t know about the subject).

Back to Intuition

I recently came across an interesting historical comment by the late Sir David R. Cox recounting several “pioneers in modern statistical theory”. The quote that stood out to me was about John Tukey (Section 13 of the article):

Some 20 or more years later, he unexpectedly came to see me at home in London one Sunday. What was I working on? I told him; it was something that he was highly unlikely to have known about. After a few moments of thought he made ten suggestions. Six I had already considered, two would clearly not work, and the other two were strong ideas which had not occurred to me even after long thought on the topic. This small incident illustrates one of his many strengths: the ability to comment searchingly and swiftly on a very wide range of issues.

Although Sir David gives no indication as to what he was working on, the passage highlights what is, to me, the “ultimate goal”. That is, the ability to give a thoughtful (perhaps critical) appraisal of a problem based only on the basics and a bit of thought (and probably some questions about the problem/data/goal). Is this kind of ability even achievable for a “non-Tukey” (who, by all accounts I’ve seen, was regarded a brilliant man)? Or does it require years of experience, supported by a strong foundation in mathematics?

P.S. Gelman’s blog

The post on Gelman’s blog linked to above links to some papers that I haven’t had a chance to read yet but appear to be extremely interesting. There is an additional review by Christian Robert at another blog.

The first paper is a “monograph” by Robert Cousins. It looks quite interesting based on my brief reading of a few of the sections.

Another is a paper by Gelman and Christian Hennig (including extensive discussion after the article): Gelman & Hennig (2017). Beyond subjective and objective in statistics. J R Statist Soc A. I haven’t had a chance to go through it all yet.

He also links to another blog post of his discussing his article with Cosma Shalizi (Philosophy and the practice of Bayesian statistics) (along with responses).

P.P.S.

I also stumbled upon these lecture notes by Adam Caulton that look interesting. (I actually found when I was googling “savage ramsey finetti jeffreys jaynes:grin:)

Apologies for how long this ended up!

2 Likes

Richard Hamming, himself an important figure in applied math/computer science, was humbled by Tukey’s knowledge and productivity. He said:

“I worked for ten years with John Tukey at Bell Labs. He had tremendous drive. One day about three or four years after I joined, I discovered that John Tukey was slightly younger than I was. John was a genius and I clearly was not. Well I went storming into Bode’s office and said, ‘How can anybody my age know as much as John Tukey does?’ He leaned back in his chair, put his hands behind his head, grinned slightly, and said, ‘You would be surprised Hamming, how much you would know if you worked as hard as he did that many years.’ I simply slunk out of the office!”

Source:

4 Likes

@ChristopherTong I don’t know how I failed to remember Richard Hamming, as he wrote a book in line with the theme of this thread: The Art of Probability, which is much closer to the information theoretic perspective I find helpful

Philosophy of science and the foundations of statistics can be good in small doses. The philosophical debates over the nature of “probability” don’t really matter in the context of data analysis.

In the context of messy, ambiguous data analysis, @Sander is always worth reading. In these articles he demonstrates how to use frequentist software to conduct an approximate Bayesian analysis.

3 Likes

Thanks for sharing, I greatly enjoyed his talk! It is fortunate that it was recorded.

While I suspected that hard work, perseverance, diligence, etc. is a major aspect, I do believe there are some “true geniuses” (and know a few people who I consider such, incidentally physicists) who seem to possess some kind of natural ability enabling them to more easily understand complex/abstract topics. Hamming actually mentions Feynman, in the Q&A, in regards to this; that he knew he would win a Nobel for something. See also Oppenheimer’s recommendation letter for Feynman for Berkeley:

Of these there is one who is in every way so outstanding and so clearly recognized as such, that I think it appropriate to call his name to your attention, with the urgent request that you consider him for a position in the department at the earliest time that it is possible. You may remember the name because he once applied for a fellowship in Berkeley: it is Richard Feynman. He is by all odds the most brilliant young physicist here, and everyone knows this.

I may give you two quotations of men with whom he has worked. Bethe has said that he would rather lose any two other men than Feynman from this present job, and Wigner said, “He is a second Dirac, only this time human.”

Or, for a historical example in the “arts”, there are those “prodigies” such as Mozart; while he was privileged and by all accounts an incredibly hard worker, there seemed to be something “special” about him. In creative areas, others similarly describe “getting” or “receiving” their “content” (Tolkien with invented languages, and current young prodigy Alma Deutscher when discussing melodies). Then again, I am a big fan of Mozart to begin with!

But I digress; Hamming gives good advice even if one’s goal is not to do “Nobel quality work”, and even (or especially) if one is not a “prodigy”.

I came across this important article by Philip Stark I had previously read, on the use of statistical models in practice. He echos the concerns expressed by @ChristopherTong

Blockquote
But not all uncertainties can be represented as probabilities.

I think this assertion requires a bit more elaboration. It seems to smuggle in the idea that one must have some physical model before invoking “probability” as a tool. Jaynes would call this the “mind projection fallacy” and takes substantial effort to ground probability as an expression of prior knowledge (not mere subjective belief).

I still highly recommend reading (and re-reading) this from time to time. I think his observation about mathematical models being used to “persuade and intimidate” rather than predict is close enough to the mark to give him credit for a bullseye hit on the target.

2 Likes

A physical model is not mandatory for the use of probability modeling and reasoning, though having one provides welcome additional insight.

Regarding “not all uncertainties can be represented as probabilities”, this can easily be shown. An example of a quantitative uncertainty which is non-stochastic is the bound on an approximation error for a Taylor series approximation of a function. In some applications, this bound is treated as an uncertainty, though it is purely deterministic. There are other examples in approximation theory, though I concede that some theorems in that field are probabilistic rather than deterministic, but certainly not all of them. (Taylor’s isn’t.)

More interesting are examples of uncertainties that cannot be quantified, which Kay and King’s book Radical Uncertainty (discussed above) gives many examples of. I cannot hope to summarize their arguments here, but a few quotes might give a flavor of where they are coming from.

The appeal of probability theory is understandable. But we suspect the reason that such mathematics was, as we shall see, not developed until the seventeenth century is that few real-world problems can properly be represented in this way. The most compelling extension of probabilistic reasoning is to situations where the possible outcomes are well defined, the underlying processes which give rise to them change little over time, and there is a wealth of historic information.

And

Resolvable uncertainty is uncertainty which can be removed by looking something up (I am uncertain which city is the capital of Pennsylvania) or which can be represented by a known probability distribution of outcomes (the spin of a roulette wheel). With radical uncertainty, however, there is no similar means of resolving the uncertainty – we simply do not know. Radical uncertainty has many dimensions: obscurity; ignorance; vagueness; ambiguity; ill-defined problems; and a lack of information that in some cases but not all we might hope to rectify at a future date. Those aspects of uncertainty are the stuff of everyday experience.

Radical uncertainty cannot be described in the probabilistic terms applicable to a game of chance. It is not just that we do not know what will happen. We often do not even know the kinds of things that might happen. When we describe radical uncertainty we are not talking about ‘long tails’ – imaginable and well-defined events whose probability can be estimated, such as a long losing streak at roulette. And we are not only talking about the ‘black swans’ identified by Nassim Nicholas Taleb – surprising events which no one could have anticipated until they happen, although these ‘black swans’ are examples of radical uncertainty. We are emphasizing the vast range of possibilities that lie in between the world of unlikely events which can nevertheless be described with the aid of probability distributions, and the world of the unimaginable. This is a world of uncertain futures and unpredictable consequences, about which there is necessary speculation and inevitable disagreement – disagreement which often will never be resolved. And it is that world which we mostly encounter.

I won’t provide a list of their examples, but I gave one of my own on another thread (discussion of the London Metal Exchange trading of Nickel in March of this year).

David A. Freedman’s rejoinder to the discussants of his classic shoe leather paper contains a similar assertion to Stark’s (Stark and Freedman were collaborators, so no suprise).

For thirty years, I have found Bayesian statistics to be a rich source of mathematical questions. However, I no longer see it as the preferred way to do applied statistics, because I find that uncertainty can rarely be quantified as probability.

Source: Freedman (1991): A rejoinder to Berk, Blalock, and Mason. Sociological Methodology, 21: 353-358.

RE: Freedman’s quote.

Blockquote
For thirty years, I have found Bayesian statistics to be a rich source of mathematical questions. However, I no longer see it as the preferred way to do applied statistics, because I find that uncertainty can rarely be quantified as probability.

My view: In Feynman’s lectures, he described the scientific method simply:

  1. Guess at a formulation of a natural law
  2. Compute the consequences
  3. Design an experiment and compare the results to experience.

Why is there such an aversion to educated “guessing” the form of the prior? It is a starting point on the path to inquiry.

(I’ll concede things are more complicated when real risk of loss is involved).

I’m not sure why this isn’t done more often, but an initial experiment can be done using frequentist theory, and later ones can use a prior derived from the IJ Good/Robert Matthews Reverse Bayes technique to guide later experiments.

2 Likes

As Richard Feynman said, “The first principle is that you must not fool yourself, and you are the easiest person to fool.”

As I already mentioned above, the team that produced the first image of a black hole knew this, and took extraordinary measures to stop their prior expectations from influencing the analysis of the data, resulting in increased credibility of the result. As the Wall Street Journal’s Sam Walker reported,

The chief threat to the EHT imaging team was the consensus that black holes look like rings. To build an algorithm that predicts what data might “look” like, they would have to make hundreds of assumptions. If they harbored any prejudice, even subconsciously, they might corrupt the formulas to produce nothing but rings.

For details, see Walker’s piece (which I cited above):

I take this aversion to probability and decision theory as having the perfect be the enemy of the good, much like the nihilist who argues because there is the possibility of error, no knowledge is possible.

The challenge of applying probability in the context of economics remains an active area of research. Notions like “incomplete markets” (PDF) and computational decision analysis are interesting…

2 Likes

Nobody on this thread has taken what @R_cubed chracterizes as a “nihilist” position. Kay & King outline the arena in which probability modeling can fruitfully be used, as I quoted above, and will repeat in part here:

The most compelling extension of probabilistic reasoning is to situations where the possible outcomes are well defined, the underlying processes which give rise to them change little over time, and there is a wealth of historic information.

My own post from May 23 above bears excerpting as well:

Much of what constitutes ‘statistical thinking’ is probabilistic thinking, which has its place for sure, but a much smaller one than statisticians demand. I’m not saying to get rid of probability, just that it must take its place as merely one of several frameworks with which to understand the use of data in the scientific enterprise, and shouldn’t have a monopoly on it.

This is a call for humility and a broader perspective, not nihilism.

This is still understating the scope of probability IMHO. Even though probability doesn’t apply to everything it applies to most things.

The scope of probability described by Kay and King (see quote above) was called the class of “small world” problems by them, and everything else belongs to the class of “large world” problems (or “grand world” in Savage’s parlance). There are numerous successful applications of probabilty in small world problems, such as the kinetic theory of gases, quantum theory, modeling of fiber optic and electronic communications signals, certain types of bacterial growth models, and of course games of chance. Would someone like to offer an example or two of a large world problem where probabilistic modeling/reasoning has been successful? Perhaps I can learn something from your examples.

1 Like

interesting information

  1. Given that too many areas of science are plagued by improper use and understanding of frequentist procedures, and have been for close to 100 years now, I think there remains large scope for what you describe as “small world” applications of probability rephrased in the language of information theory. In economic terms, large areas of scientific inquiry are well below the “efficient frontier” in terms of information synthesis. [1]

  2. I fail to see why a large class of problems described by the term “Knightian uncertainty” isn’t covered by frequentist theory of minimax decision rules. [2][3]

  3. Some problems might require conservative extensions to probability theory. In mathematical logic, the program of Reverse Mathematics takes the theorems of classical mathematics as given, and searches for the axioms needed to prove them. A number of sub-systems of varying logical strength have been discovered. Likewise, there are a number of proposals for extending probability theory: ie. IJ Good’s Dynamic Probability, or Richard Jeffrey’s probability kinematics. [4]

  4. I expect the RT Cox/ET Jaynes approach to Bayesian analysis to advance where it can handle non-parametric problems as easily as frequentist methods. Combining Cox-Jaynes with the Keynes/Good/Jeffreys/Wally notion of interval probabilities leads to Robust Bayesian Inference as noted in [5].

Further Reading

  1. Anything by @Sander and colleagues on the misinterpretation of P values.
  2. Kuzmics, Christoph, Abraham Wald’s Complete Class Theorem and Knightian Uncertainty (June 27, 2017). Available at SSRN here. Also peer reviewed and published here.
  3. David R. Bickel “Controlling the degree of caution in statistical inference with the Bayesian and frequentist approaches as opposite extremes,” Electronic Journal of Statistics, Electron. J. Statist. 6(none), 686-709, (2012) link
  4. Good, I. J. “The Interface Between Statistics and Philosophy of Science.” Statistical Science 3, no. 4 (1988): 386–97. link
  5. Stefan Arnborg, Robust Bayesian analysis in partially ordered plausibility calculi,
    International Journal of Approximate Reasoning, Volume 78, 2016, Pages 1-14, ISSN 0888-613X, link
2 Likes

Fascinating discussion!
A related blog post on Ramsey vs Keynes is at Syll’s blog:

which links to previous critiques of Bayes there, including most recently

3 Likes

With regard to @R_cubed 's comment on information theoretic formulations of probability being deployed in “small world” problems, as I mentioned above (5/18/22 post), I agree that the “maximum entropy” approach is intriguing and worthy of consideration, and I would be open to seeing more (and judicious) attempts to use it for “small world” problems.

I would still like to see some concrete “large world” examples of successful probabilistic reasoning introduced to this thread. As @f2harrell said, “probability doesn’t apply to everything” but “it applies to most things”, so it shouldn’t be difficult to locate examples (though my personal biases may be interfering with my ability to think of one). These examples may help to show in what ways the views I offered above are poorly-baked or just plain wrong.

Thanks to @Sander_Greenland for some thought-provoking links.

2 Likes

I’m afraid I may be misunderstood here. I’m prepared to accept that my views, enunciated above, are too pessimistic, but only via discussion of concrete examples, not conjecture. I’ve discussed several real examples on this thread that further my perspective, but naturally those have been “cherry picked” (proton mass, black hole image, London Metals Exchange trading of Nickel in March, 2022). Where are the examples that contradict my pessimism?

RE: LME example. It isn’t clear what you are looking for. I can’t think of any one other than a high level LME member who would have enough information to have a posterior probability of default over 0.5.

The LME has been in existence for just over 100 years This isn’t the first time the exchange had closed. It is now owned by a Chinese firm Hong Kong Exchanges and Clearing.

I’ve never worked in risk management, but have a very basic idea on how they think. Shrewd risk managers might not have actually predicted a closure, but by monitoring prices of various securities and world news, they would still have been able to act by diverting trades to other exchanges, stop trading on that exchange, or even take short positions in LME securities, (ie. put options on a more stable exchange, synthetic short, use of an OTC credit default swap (CDS) etc.).

Your fundamental premise is debatable — ie. that there exists good examples in the public domain on good uses of probabilistic reasoning. Any real world use almost certainly takes place in an zero-sum market context, where information that is useful but not widely known, is valuable.

I’ll post some more thoughts on this in the thread I created on Stark’s paper.

1 Like

I say for a second time (note 1), a fundamental point has been missed. Canceling 8 hours of legitimate trades is unprecedented. Period. This is a public domain example with a century of historical data, as @R_cubed noted. Yet, the inconceivable still happened. What other inconceivable events might have happened but didn’t - and what priors should be assigned to them? How large should the probability space be to accomodate the infinite number of “moon made of green cheese” outcomes that must therefore be considered even if unlikely? I submit it as an example of what Kay and King call a “large world” problem where probability reasoning would have been helpless.

This is not one of their examples, as it occured 2 years after their book was published. However, their examples (eg, their discussion of the financial crisis of 2008) include ones that played out in public view. If “probability applies to most things” (as observed by @f2harrell ) it should apply to most public domain decisions too.

(1) See

Blockquote
I say for a second time (note 1), a fundamental point has been missed. Canceling 8 hours of legitimate trades is unprecedented. Period.

It is certainly rare, but not unprecedented. Assessing counter-party risk is part of the business. All that glitters isn’t always gold.

Contracts were cancelled in 1985 during the “Tin crisis”

The London Metal Exchange is owned by a Chinese business entity. War has restricted the supplies of nickel, driving up the price. Circumstances have the U.S. and Chinese government as adversaries. That alone should indicate to anyone sensible that:

Blockquote
Past performance is no guarantee of future results.

I do recall a time not all that long ago when the U.S. government banned short sales of financial institutions during a crisis.

So much for “price discovery” when the insiders don’t want such things “discovered.”

1 Like