Necessary/recommended level of theory for developing statistical intuition

R_cubed · August 6, 2022, 2:48pm

Given that too many areas of science are plagued by improper use and understanding of frequentist procedures, and have been for close to 100 years now, I think there remains large scope for what you describe as “small world” applications of probability rephrased in the language of information theory. In economic terms, large areas of scientific inquiry are well below the “efficient frontier” in terms of information synthesis. [1]
I fail to see why a large class of problems described by the term “Knightian uncertainty” isn’t covered by frequentist theory of minimax decision rules. [2][3]
Some problems might require conservative extensions to probability theory. In mathematical logic, the program of Reverse Mathematics takes the theorems of classical mathematics as given, and searches for the axioms needed to prove them. A number of sub-systems of varying logical strength have been discovered. Likewise, there are a number of proposals for extending probability theory: ie. IJ Good’s Dynamic Probability, or Richard Jeffrey’s probability kinematics. [4]
I expect the RT Cox/ET Jaynes approach to Bayesian analysis to advance where it can handle non-parametric problems as easily as frequentist methods. Combining Cox-Jaynes with the Keynes/Good/Jeffreys/Wally notion of interval probabilities leads to Robust Bayesian Inference as noted in [5].

Further Reading

Anything by @Sander and colleagues on the misinterpretation of P values.
Kuzmics, Christoph, Abraham Wald’s Complete Class Theorem and Knightian Uncertainty (June 27, 2017). Available at SSRN here. Also peer reviewed and published here.
David R. Bickel “Controlling the degree of caution in statistical inference with the Bayesian and frequentist approaches as opposite extremes,” Electronic Journal of Statistics, Electron. J. Statist. 6(none), 686-709, (2012) link
Good, I. J. “The Interface Between Statistics and Philosophy of Science.” Statistical Science 3, no. 4 (1988): 386–97. link
Stefan Arnborg, Robust Bayesian analysis in partially ordered plausibility calculi,
International Journal of Approximate Reasoning, Volume 78, 2016, Pages 1-14, ISSN 0888-613X, link

Sander_Greenland · August 6, 2022, 6:26pm

Fascinating discussion!
A related blog post on Ramsey vs Keynes is at Syll’s blog:

which links to previous critiques of Bayes there, including most recently

ChristopherTong · August 7, 2022, 2:14am

With regard to @R_cubed 's comment on information theoretic formulations of probability being deployed in “small world” problems, as I mentioned above (5/18/22 post), I agree that the “maximum entropy” approach is intriguing and worthy of consideration, and I would be open to seeing more (and judicious) attempts to use it for “small world” problems.

I would still like to see some concrete “large world” examples of successful probabilistic reasoning introduced to this thread. As @f2harrell said, “probability doesn’t apply to everything” but “it applies to most things”, so it shouldn’t be difficult to locate examples (though my personal biases may be interfering with my ability to think of one). These examples may help to show in what ways the views I offered above are poorly-baked or just plain wrong.

Thanks to @Sander_Greenland for some thought-provoking links.

ChristopherTong · August 8, 2022, 6:07am

I’m afraid I may be misunderstood here. I’m prepared to accept that my views, enunciated above, are too pessimistic, but only via discussion of concrete examples, not conjecture. I’ve discussed several real examples on this thread that further my perspective, but naturally those have been “cherry picked” (proton mass, black hole image, London Metals Exchange trading of Nickel in March, 2022). Where are the examples that contradict my pessimism?

R_cubed · August 8, 2022, 10:08am

RE: LME example. It isn’t clear what you are looking for. I can’t think of any one other than a high level LME member who would have enough information to have a posterior probability of default over 0.5.

The LME has been in existence for just over 100 years This isn’t the first time the exchange had closed. It is now owned by a Chinese firm Hong Kong Exchanges and Clearing.

I’ve never worked in risk management, but have a very basic idea on how they think. Shrewd risk managers might not have actually predicted a closure, but by monitoring prices of various securities and world news, they would still have been able to act by diverting trades to other exchanges, stop trading on that exchange, or even take short positions in LME securities, (ie. put options on a more stable exchange, synthetic short, use of an OTC credit default swap (CDS) etc.).

Your fundamental premise is debatable — ie. that there exists good examples in the public domain on good uses of probabilistic reasoning. Any real world use almost certainly takes place in an zero-sum market context, where information that is useful but not widely known, is valuable.

I’ll post some more thoughts on this in the thread I created on Stark’s paper.

ChristopherTong · August 9, 2022, 12:16am

I say for a second time (note 1), a fundamental point has been missed. Canceling 8 hours of legitimate trades is unprecedented. Period. This is a public domain example with a century of historical data, as @R_cubed noted. Yet, the inconceivable still happened. What other inconceivable events might have happened but didn’t - and what priors should be assigned to them? How large should the probability space be to accomodate the infinite number of “moon made of green cheese” outcomes that must therefore be considered even if unlikely? I submit it as an example of what Kay and King call a “large world” problem where probability reasoning would have been helpless.

This is not one of their examples, as it occured 2 years after their book was published. However, their examples (eg, their discussion of the financial crisis of 2008) include ones that played out in public view. If “probability applies to most things” (as observed by @f2harrell ) it should apply to most public domain decisions too.

(1) See

R_cubed · August 9, 2022, 1:12am

Blockquote
I say for a second time (note 1), a fundamental point has been missed. Canceling 8 hours of legitimate trades is unprecedented. Period.

It is certainly rare, but not unprecedented. Assessing counter-party risk is part of the business. All that glitters isn’t always gold.

Contracts were cancelled in 1985 during the “Tin crisis”

The London Metal Exchange is owned by a Chinese business entity. War has restricted the supplies of nickel, driving up the price. Circumstances have the U.S. and Chinese government as adversaries. That alone should indicate to anyone sensible that:

Blockquote
Past performance is no guarantee of future results.

I do recall a time not all that long ago when the U.S. government banned short sales of financial institutions during a crisis.

So much for “price discovery” when the insiders don’t want such things “discovered.”

ChristopherTong · August 9, 2022, 1:42am

Excellent! I was not aware of the tin crisis. However the article cited states that:

Open contracts struck at the high prices quoted on the LME at the time trading was suspended were settled with reference to the much-lower prices that were soon seen afterward in the physical market.

This is quite different from just canceling 8 hours of trades. My reading of the above is that open contracts at the time of the halt were re-priced. Thus, I stand by my assertion that the events of March 2022 at the LME nickel trading market were unprecedented. The market participants quoted in the article clearly did not see it coming. And “past performance is no guarantee of future results” is one facet ot the general critique being made by K^2, as this dictum applies to “large world” problems. Market participants can try to anticipate unprecedented events using knowledge of context and global market conditions, as @R_cubed rightly notes. The only question is, do they use a probabilistic framework, or are they acting more heuristically? (And in this case it seems clear that whatever framework they used, the actual events of that day took many of them completely off guard.)

R_cubed · August 9, 2022, 2:16am

The U.S. futures markets use a sophisticated, quantitative risk assessment system known as SPAN (Standard Portfolio Analysis of Risk). U.S. stock markets use different guidelines.

Historical data (variance estimates and inter-market correlations) are used to estimate the possible worst case loss for a portfolio (usually 1 day look ahead) in order to calculate margin requirements. So those who get paid to keep exchanges open use heavily quantitative methods.

SPAN always struck me as a well thought out system that was good for traders but protected the market.

ChristopherTong · August 9, 2022, 5:18am

Superb, this discussion is taking a very productive turn!

According to the CME, SPAN is a Value at Risk (VaR) based system. https://www.sfu.ca/~poitras/span-methodology.pdf

While I am not immediately aware of failures of SPAN in US options trading (though I will now start looking for them), the use of VaR was implicated in the financial crisis of 2008. For instance, supposedly the failure of Northern Rock shouldn’t have happened based on a VaR analysis (this is an example discussed by Kay and King). K^2 write that

Although value at risk models may be of some use in enabling banks to monitor their day-to-day risk exposures, they are incapable of dealing with the ‘off-model’ events which are the typical cause of financial crises.

Riccardo Rebonato (I thank @R_cubed for pointing me to this author) states that VaR analysis is an “interesting and important measure” of risk “when nothing extraordinary happens”. But of course in our lifetimes markets have experienced “extraordinary” events on multiple occasions…because financial markets are actually “large world” arenas. And as we’ve seen, regulators in 2008 took a very improvisational approach to handling such crises…as far as I know, they did not rigidly follow the prescription of some probability-based framework (and thus, their behavior could not be so modeled by other market participants).

It seems that K^2 and Rebonato are saying that VaR has its honored place in risk modeling, but cannot be depended upon to fully assess risk when market non-stationarity is changing rapidly, which is precisely a “large world” issue. VaR works best when the “large world” system is behaving (temporarily) in a “small world” fashion.

R_cubed · August 9, 2022, 8:16am

I don’t know of anyone who thinks financial markets are perfectly modeled by simple probability distributions, or that VaR is a complete methodology.

It is fundamental to me that markets are incomplete in the Arrow-Debreu sense: there exist more risk than there are securities to hedge that risk.

It is also an interesting question whether these financial “innovations” end up increasing risk, rather than simply transferring it ie. moral hazard.

A very thorough (open source) text on VaR methods (with utility to other areas of applied stats) can be found here:

My biggest objection (which I will elaborate on later) is their recommendation to compare and contrast different “narratives”. This entire notion of “narratives” has created a cargo cult of “experts” who advise representatives of constituents, who do not communicate information to the people they are supposed to represent, but manipulate them in the fashion of Thaler’s “nudges.”

A detailed critique can be found here:

I essentially agree with this clip from an FT review

Blockquote
The authors of this book belong to the elite of British economic policymaking …Their alternative to probability models seems to be, roughly, experienced judgment informed by credible and consistent “narratives” in a collaborative process. They say little about how those exercising such judgment would be held to account.

f2harrell · August 9, 2022, 11:39am

This is a fascinating discussion. I would like for us to return more to the original topic.

Sander_Greenland · August 9, 2022, 4:28pm

Frank: I think the discussion between Tong and R-cubed is most relevant for the thread topic of the necessary level of theory. They are discussing how theory as conceived in statistics is a very narrow window on reality, constrained as it is by formal mathematical (probability) conceptual models. In health-med sci this theory is often (I’d say typically) oversimplified in a fashion that makes study results appear far more precise than they should, stoking grossly overconfident inferences in the form of understated error rates and overstated posterior probabilities. The question is then how to moderate overemphasis on mechanical mathematics with coverage of heuristics and judgment, an issue raised by those like Tukey and Hamming some 50-60 years ago.

Like misuse of NHST, this theory/reality tension remains largely unaddressed because so many of those tasked with stat teaching know little about how formal theory enters into heuristics and judgment, having themselves been forced to focus on formal theory to have a successful academic career in statistics. Aggravating that is time pressure: Formal theory being so algorithmic makes it far easier to implement in teaching and examinations; whereas good coverage of judgment and its pitfalls requires knowledge of real research areas and historical facts, as raised by Tong and R-cubed, and has far fewer instructor-friendly sources. The question is then: How do we go beyond toy examples to integrate real case studies into teaching, so that students can develop sound intuitions about the proper uses and shortcomings of formal statistics theory and its computer outputs?

ChristopherTong · August 10, 2022, 7:11am

At the end of this post, I will (attempt to) tie the following comments to the original topic.

Kay and King are skeptical of behavioral economists like Thaler (who they specifically critique), though this is not obvious from Boissonnet’s book review, which I hadn’t seen until now. Despite this, I recommend Boissonnet’s piece as a better summary of K^2 than I have given on this thread (kudos to @R_cubed for sharing it here). Boissonnet’s principle criticism seems to be that K^2 ignore volumes of academic literature that purport to address the flaws in statistical and economic reasoning that K^2 critique. Boissonnet’s perspective would be more compelling if some of those academic theories he cites had been deployed successfully in the field. This is why I have been asking for concrete examples.

Note that no one expects models to be perfect: George Box said “all models are wrong, some are useful”. Let’s find these useful examples in “large world” problems, as they may help us sharpen our statistical intuition when it’s our turn to cope with such problems. VaR is a compelling example, but has inherent limits, as I’ve argued above, and such limits may lead to serious trouble (eg, unanticipated bank failure). Developing statistical intuition should include the ability to judge when or to what extent probability reasoning/theory is even applicable to a given problem. Even if probability applies to “most things” you may not always choose to use it, just as a physicist does not always choose a quantum description of a system, even though such a description would “always” be applicable in principle.

R_cubed · August 10, 2022, 8:26am

I’ve been asking for clarification of what “success” means in this context. What gets dismissed as failure or incompetence I’ve come to see as misdirection.

I can tell you (having lived through 3 boom and bust cycles since the mid '90s) that a significant number of people in the financial markets knew there was a tech bubble (circa 1998–2002) and a housing bubble (circa 2005-2008), despite media, Federal Reserve and government declarations to the contrary at the time.

If you looked at the options series on a number of financial stocks (circa 2006-2007), you saw a price implied probability density that was factoring in the possibility of large negative price moves than anyone in positions of authority accepted as possible. This was specifically apparent in the regional banks and subprime lenders.

If you then looked at the financial statements, you saw earnings being propped up by a reduction in loss reserves on a geographically concentrated real estate portfolio, often of questionable credit quality.

Had you pointed this out to “professionals”, you got dismissive, defensive rebuttals. Yet, all of those corps went bankrupt.

Does that count as “success”?

Further Reading

Mizrach, Bruce (2010). “Estimating implied probabilities from option prices and the underlying.” Handbook of quantitative finance and risk management. Springer, Boston, MA, 515-529. (PDF)

ChristopherTong · August 11, 2022, 5:07am

At the end of the prologue of Michael Lewis’ The Big Short, he writes:

It was then late 2008. By then there was a long and growing list of pundits who claimed they predicted the catastrophe, but a far shorter list of people who actualy did. Of those, even fewer had the nerve to bet on their vision. It’s not easy to stand apart from mass hysteria–to believe that most of what’s in the financial news is wrong, to believe that most important financial people are either lying or deluded–without being insane.

The success here is by those who may have been tipped off by a probability model (and others who didn’t need a model to raise their suspicion) but then, as @R_cubed notes, proceeded to gather additional information to understand and interpret what the model was claiming, and find out if it made sense (dare I say…a narrative?). K^2 call this process figuring out “what is going on here”. You don’t simply turn a mathematical/compuational crank - you have to understand and interpret the data and the model output (either of which could be misleading), which often requires gathering more and better data, along multiple lines of evidence. The same process is described by David Freedman in his famous “shoe leather” paper, which is heavily excerpted in this longer and very insightful essay (in which many of the ‘good’ examples do not use probability or statistical inference at all):

As we’ve already discussed, there were other failures during the housing bubble crisis that were not evident from the probability models (VaR and Northern Rock’s failure is an example given by K^2). The successful users of probability models should consider them simply as fallible tools to help them think of more questions and provide directions for seeking more information and data, not turn-the-crank machines that provide risk management recipes free of shoe leather and critical thinking. And in this process the role of heuristics and narrative economics have their role. The discussion on this thread has convinced me that the large world/small world heuristic given by K^2 is indeed a valuable one for understanding the limits and hazards of probability reasoning, but there may be others.

ChristopherTong · August 11, 2022, 5:15am

I just realized I had recommended the Freedman paper back in my May 11 post above; another paper mentioned there that is relevant to the immediate points made above is
https://www.nature.com/articles/d41586-018-01023-3

R_cubed · August 11, 2022, 11:58am

I will merely point out that this quote from Freedman (which I take as representative of your views)

Blockquote
For thirty years, I have found Bayesian statistics to be a rich source of mathematical questions. However, I no longer see it as the preferred way to do applied statistics, because I find that uncertainty can rarely be quantified as probability.

is different from this:

Blockquote
The successful users of probability models should consider them simply as fallible tools to help them think of more questions and provide directions for seeking more information and data, not turn-the-crank machines that provide risk management recipes free of shoe leather and critical thinking.

No one (especially not me) argued for the latter. To do so would be engaging in the practice of
“persuasion and intimidation” that Stark properly condemns. Unfortunately, I find much of “evidence based” rhetoric accurately described by his language.

I believe I’ve shown via the example of extracting the implied probabilities from options prices, that even decision makers at the highest levels, in a regime of incomplete markets, find valuable information contained in a subjective, betting odds interpretation of probability that has no physical meaning.

Indeed, I consider precise probability models merely as a point of reference to compute the width of the range of interval probabilities (aka covering priors) worth considering.

Related Thread

ChristopherTong · August 12, 2022, 1:13am

The cited quotes are different but 100% compatible. Freedman’s statement is not simply a critique of Bayesian probability but of all probability. In Sir David Cox’s reply to Leo Breiman’s “Two Cultures” paper (2001), he wrote:

I think tentatively that the following quite commonly applies. Formal models are useful and often almost, if not quite, essential for incisive thinking. Descriptively appealing and transparent methods with a firm model base are the ideal. Notions of significance tests, confidence intervals, posterior intervals and all the formal apparatus of inference are valuable tools to be used as guides, but not in a mechanical way; they indicate the uncertainty that would apply under somewhat idealized, maybe very idealized, conditions and as such are often lower bounds to real uncertainty. Analyses and model development are at least partly exploratory.

Cox may be too optimistic in light of the issues I raised in my 2019 paper (which many others have raised elsewhere and earlier than me), as well as situations where one is attempting to use models that work in only “small world” scenarios when the non-stationary aspects of the “large world” arena start to manifest. These are examples when formal models can become very disconnected from reality and potentially more misleading than other approaches. On the other hand, numerous examples from the history of science show how data can lead to scientific discovery without calling on a probability model at all, e.g.,
https://journals.sagepub.com/doi/abs/10.1177/0149206314547522

See also the Freedman paper I linked above, and my discussion of Max Planck’s work on so-called blackbody radiation in my 2017 paper with B. Gunter, cited above (I repeat the link here) https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2017.01057.x

I withhold judgment on the implied probabilities method until I’ve looked further into it, but thanks to @R_cubed for providing concrete examples such as this one.

ChristopherTong · August 12, 2022, 3:52am

I propose a test of the implied probability method. If such a thing as Nickel options exist, simulate the performance of the implied probability method on nickel options for the weeks leading up to March 8, 2022…and then the immediately subsequent weeks.