Necessary/recommended level of theory for developing statistical intuition

I brought this up after Day 1’s lecture but I thought it might warrant further discussion/elaboration. (Apologies for the length!)

I am of the opinion that a “strong” background/foundation in theory is important to becoming a good applied statistician. I leave “strong” in quotation marks to be purposefully vague, as I don’t think it can really be quantified. The concept may be related to “mathematical maturity” or “developing statistical intuition”. (I welcome contrasting viewpoints of course!)

Reasoning (aside)

I came to this opinion when I had to plan a study (design, sample size, etc.) for work. A statistical consultant had given a very basic assessment without even looking at the small amount of data we had, recommending paired t-tests and using some standard sample size formula. However, I looked at the data and noticed a large proportion of 0 values with the positive values fairly strongly right-skewed. This led me to learn about zero-inflation, hurdle models, and the Tweedie family. Furthermore, a paired t-test was not appropriate as there were repeated measures per subject and experimental condition, so I started to read about generalized linear mixed models (GLMM). I recommended a pilot study with what I considered to be an appropriate design and used that data to write simulations for power/sample size purposes (and for some exploratory analysis). Along the way, I ran into many issues and read a lot on StackExchange and attempted to understand Douglas Bates’ unpublished lme4 text. While I could write code to analyze and interpret data, I felt that I was just “following instructions” (or best practices), without really knowing what I was doing. For example, I found variance components, conditional modes or best linear unbiased predictors (BLUPs), nested/crossed effects, etc. to be much more difficult than working with GLMs.

Main

I asked @f2harrell if he thought understanding a text such as Casella & Berger (2001): Statistical Inference is a necessary or beneficial foundation. I used this text as an example because it is usually an “intermediate level” first-year graduate course for Statistics programs and is more challenging than texts such as “Wackerley, Mendenhall, & Scheaffer (2008): Mathematical Statistics with Applications”, or Hogg, McKean, & Craig (2018): Introduction to Mathematical Statistics, or Larsen & Marx (2017): Introduction to Mathematical Statistics and Its Applications (which I have read in its entirety). None of the above require knowledge of measure theory.

Question

To @f2harrell, @Drew_Levy, and any other working statisticians: is my opinion/view to restrictive? For example, while I think a strong foundation in linear algebra is a great help when you get to (generalized) linear models, we are of course not writing our own code to do QR decompositions and so forth. Additionally, are there any textbooks you would recommend to develop statistical understanding beyond a “cookbook level” (i.e., just following steps/best practices)? Bayes-focused texts are welcome.

P.S.
A newer introductory/intermediate text with more focus on Bayes and computer methods is Efron & Tibshirani (2016): Computer Age Statistical Inference. I have heard good things about Jaynes (2003): Probability Theory: The Logic of Science (PDF warning) for Bayesian understanding, and Cosma Shalizi’s Advanced Data Analysis From an Elementary Point of View for anyone doing data analysis.

9 Likes

Wonderful questions Christopher, and I hope we get input for a diverse group. For serious students linear algebra is a very important subject to become moderately good at. Before that, the Richard McElreath book Statistical Rethinking is my view the first stat book one should be exposed to. It is one of the few that does not involve any frequentist ideas, and because it is built on Bayesian thinking, the level of intuition that McElreath brings is unsurpassed in the statistical text world. This also allows him to solve real-world problems and deal with more complexity (e.g., missing data and measurement error) than you would ever see in a first course in statistics that is frequentist-based.

Another dimension to the problem is that classical statistics relies far too heavily on normality, large sample theory, and the central limit theorem. IMHO we should be teaching semiparametric models early and often. They can handle extreme clumping at zero, bimodality, ceiling effects, and are invariant to how you transform Y. Classical stat teaching assumes that we already know how to transform Y before we do any modeling.

10 Likes

I agree that linear algebra is fundamental. However, in my view, measure theory and much of higher level mathematical statistics is of peripheral value for doing applied statistics. As Frank noted, much of this theory is based on ideal assumptions that are not realized in much of what we see in real life (I’ll add to his list: IID or very simple covariance structures, linearity, the very existence of a well defined probability space to which you are doing inference). Skimming through Casella and Berger (or an even shorter book such as Silvey’s ‘Statistical Inference’) would familiarize you with some of the overarching concepts of statistical inference, even if you did not try to prove all the theorems. Make a similar effort for books on linear models, experimental design, and more specialized topics (like repeated measures, which you mention). It is important to understand the methods you use, and to have an awareness of methods that are available, so I do agree that you should go well beyond a cookbook level of practice, but this does not mean you have to personally prove every textbook-theorem about every method you intend to use. (More power to you if you do, of course!)

The Efron/Hastie book you mention may be a good resource, as they don’t prove theorems but introduce you to a large number of contemporary methods. However, their statement on the bottom of page 3, “It is a surprising, and crucial, aspect of statistical theory that the same data that supplies an estimate can also assess its accuracy”, is seemingly refuted by their own Chapters 12 & 20.

What is poorly covered in the ‘standard’ texts of our field is how to actually use data to learn and make decisions - an important component of ‘statistical intuition’. (Frank’s RMS is a notable attempt to address this gap.) Reading the history of science, engineering, or medicine will give tremendous insight into how discoveries were made and theories accepted (and rejected) based on data, and many of the historical examples did not depend on “statistical methods” at all. Tukey recommended that “it might be well if statisticians looked to see how data was actually analyzed by many sorts of people.” If you do so, you’ll find many success stories where “statistical methods” are wholly absent. I might also recommend a few articles (which will be faster to read than tackling a massive statistics textbook) such as:

H. Quinn, 2009: What is science? Physics Today, 62 (7): 8-9.

P. Diaconis, 1985: Theories of data analysis: From magical thinking through classical statistics. In Exploring Data Tables, Trends, and Shapes, ed. by D. C. Hoaglin, F. Mosteller, and J. W. Tukey (New York: Wiley), 1-36.

D. A. Freedman, 1999: From association to causation: some remarks on the history of statistics. Statistical Science, 14: 243-258.

A. S. C. Ehrenberg and J. A. Bound, 1993: Predictability and prediction. Journal of the Royal Statistical Society, Series A, 156: 167-206.

G. E. P. Box, 1976: Science and statistics. Journal of the American Statistical Association, 71: 791-799.

G. E. P. Box, 2001: Statistics for discovery. Journal of Applied Statistics, 28: 285-299.

C. Mallows, 1998: The zeroth problem. The American Statistician, 52: 1-9.

M. R. Munafo and G. Davey Smith, 2018: Robust research needs many lines of evidence. Nature, 553: 399-401.

Excellent case studies:

C. Seife, 2000: CERN’s gamble shows perils, rewards of playing the odds. Science, 289: 2260-2262.

J. Couzin-Frankel, 2013: When mice mislead. Science, 342: 922-925.

G. J. Lithgow, M. Driscoll, and P. Phillips, 2017: A long journey to reproducible results. Nature, 548: 387-388.

(This is just a starter list subsetted from a longer one that I’ve been developing for years.)

8 Likes

i think it’s crucial for a deeper understanding, and the details of derivations etc in my msc course notes are not easily found in the books (i had a stats colleague who looked at them with envy). As an applied stat i have had to derive eg sample size calculations from first principles. In 2003 i had to design a cross over in incomplete blocks with a binary outcome, there was nothing in the literature for that at the time. Senn recently wrote about his code for powering a cross over trial: “It is a shock to realise that the second edition of my book Cross-over Trials in Clinical Research appeared 20 years ago. It includes a rather complicated SAS program for sample size determination for an AB/BA cross-over design.” Power to the difference, it’s funny because i reached out to him at the time and he provided some code, it might have been this. There were not so many active message baords etc 20 years ago, you had to write an email to a professor, one of them told me there was nothing in the literature for the scenario i described and “the topic is yours if you want it”. For anything out of the ordinary like this you had to write your own code, in matlab or whatever, and that old school habit is coming back, as people move away from sas and talk about open source etc. I was just looking at a package for mixed modelling in julia: GitHub - JuliaStats/MixedModels.jl: A Julia package for fitting (statistical) mixed-effects models It’s worth writing the code just to acquire the deeper understanding you gain from it, and to contribute something to the community, but then you need to know numerical shortcuts etc, there is a good book for that: http://numerical.recipes/

4 Likes

A random thought: Of all the intuitive books I’ve read, Probability Models by Ross is one of the top ones. You really understand conditional probability after reading that book, and how to use conditioning to solve a lot of interesting problems.

2 Likes

I think this video by Michael I Jordan is extremely helpful for understanding how Bayes and Frequentist methods complement each other, with the key tool being decision theory. (slides for video)

Blockquote
Decision theory is an extremely useful perspective in thinking about fundamentals of statistical inference. (10:20-10:44)

This video with Jim Berger, Sander Greenland, and Robert Matthews provided me with the information needed to correct a lot of misconceptions I had about the relation between Frequentist and Bayesian methods. Matthew’s resurrection of Reverse Bayesian methods creates a great opportunity to combine the insights of Bayesians with those of the Frequentists.

Finally, more recent work on so-called “fiducial inference” via the notion of “Confidence” distributions (a generalization of confidence intervals) can show relationships between Bayesian posteriors, bootstrap distributions, and “Fiducial” distributions. A whole collection of papers on this unifying perspective (known as BFF: Bayes, Frequentist, Fiducial → Best Friends Forever) can be found here (most require “mathematical maturity”):

https://stat.rutgers.edu/home/mxie/SelectedPapers.htm

A good intro to confidence distributions can be found in this paper by @COOLSerdash as well as searching past threads.

https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.8293

Related thread:

7 Likes

Greenland’s contribution in the second video you linked was, for me, the most valuable. He expands on those thoughts in this two part lecture.

S. Greenland, 2021: Advancing statistics reform: cognition and causation before probability and inference. 21 and 26 May 2021.
- Part 1: University of Bristol: IEU Seminar: Sander Greenland, 24 May 2021: Advancing Statistics Reform, part I - YouTube
- Part 2: University of Geneva: PartII_SGreenland_27052021 - Prof. Sander Greenland on Statistics Reform

5 Likes

This is consistent with my experience. One other example: after fully digesting McGraw & Wong’s classic (1996) paper on intraclass correlation coefficients, I was able to show that a SAS code floating around my then-organization for calculating ICCs was incorrect. To appreciate that paper requires a more than superficial understanding of linear models/ANOVA.

1 Like

yes, it’s alarming how common such things are. Maybe there’s an ethical argument then for the statistician to know the theory (a cluster randomised trial with unequal clusters and some other complication like non-inferiority could be over/underpowered if it’s not available in standard software). ICH E9 says “all statistical work associated with clinical trials will lie with an appropriately qualified and experienced statistician”. 20 years ago everyone took this to mean an msc in stats, but i saw a survey of companies more recently that suggested this is now interpreted as phd level

As a corollary to @f2harrell’s emphasis the conceptual clarity offered by Bayesianism, I’d like to offer a quote from this wonderful Scientific American essay by Emily Reihl of Johns Hopkins, about what seems to me a similar phenomenon in mathematics:

How is it that mathematicians can quickly teach every new generation of undergraduates discoveries that astonished the previous generation’s experts? Part of the answer has to do with recent developments in mathematics that provide a “birds-eye view” of the field through ever increasing levels of abstraction. Category theory is a branch of mathematics that explains how distinct mathematical objects can be considered “the same.” Its fundamental theorem tells us that any mathematical object, no matter how complex, is entirely determined by its relationships to similar objects. Through category theory, we teach young mathematicians the latest ideas by using general rules that apply broadly to categories across mathematics rather than drilling down to individual laws that apply only in a single area.

2 Likes

Could anyone make a recommendation for texts which would help to develop that ‘moderate’ level of (applied) skill in linear algebra? I’ve found my lack of knowledge of linear algebra to be a hindrance, especially when working with Stan.

Check out the American Institute of Mathematics list of textbooks available under a permissive license. Dover publications has a textbook on linear algebra specifically for statistics.

4 Likes

There are several good ones, the usefulness of which will depend on one’s experience.

  • Mike X. Cohen: Linear Algebra: Theory, Intuition, Code (2021): I think this one would be pretty good for self-study but I found it a little basic (only in the sense that there isn’t much depth). I really liked the author’s explanations for matrix multiplication, though. He also includes Matlab and Python code for various problems. This might be best for an absolute beginner or if you just need to quickly re-learn things. It is a very quick read, and quite cheap; definitely worth the money in my opinion.
  • Gilbert Strang: Introduction to Linear Algebra, 5th Edition (2016): I haven’t read the whole thing but you probably can’t go wrong w/ Strang’s book. This one also goes along with MIT OCW courses (18.06 is the standard undergrad course; 18.700 is the theory/proof-based one). He has a few different books so I don’t know which is the most recent, or what the differences are among them. This might be the one you want if you learn best in a classroom-like atmosphere; i.e., if you decide to complete the course through OCW/Edx. All lectures are posted on Youtube.
  • David C. Lay et al.: Linear Algebra and its Applications, 5th Edition (2014): I also haven’t read this but I always see this listed on Reddit or StackExchange in threads asking for recommendations.
  • Jim Hefferon: Linear Algebra: this is a free online textbook that I haven’t read but I’ve heard good things about it. The link to Leanpub actually lets you download the whole book when you click on “Sample”, but the author also accepts money for it.
  • James Gentle: Matrix Algebra: Theory, Computations and Applications in Statistics, 2nd Edition (2017): this is also not a (standard) textbook, nor is it introductory. But it is quite good as an intermediate-level reference. The author also has a whole Part dealing with numerical methods. I really liked this one but I don’t think it’s good for beginners, and perhaps not the best for self study.
  • Boyd & Vandenberghe: Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares (2018), which I think is more advanced (perhaps intermediate). It is freely available at the above link. Of interest, Gilbert Strang’s review says: “This book explains the least squares method and the linear algebra it depends on - and the authors do it right!”. But some reviews mention that it isn’t great for self study.

And if you’d like another intermediate text with excellent reviews you could check out Axler’s text, Linear Algebra Done Right, is very popular among mathematicians but from what I understand it is not meant for someone interested primarily in applications (e.g., he doesn’t “do” matrix computations). I own it but haven’t gotten around to going through it, yet. [Incidentally, MIT now uses it for their proof-based linear algebra course mentioned above.]

Finally, you could always search for threads on Reddit or StackExchange, such as this one which explicitly asks about statistics.

5 Likes

Thanks! The Mike X Cohen text seems perfect for my purposes, as does the Basilevsky recommended above too. (I took some linear algebra classes in undergrad years ago but I am effectively starting from scratch).

He has an accompanying free video course for the book too.

3 Likes

Thank you for responding and I agree with you here. I suppose I should have clarified that I wasn’t implying a PhD-level of theory background is necessary (unless measure theory is typically required for MS students as well; in that case perhaps it would be useful, but certainly not necessary for applied work).

I again agree completely with what you wrote. It is one of the reasons I enjoy the RMS text so much. In my view, the text seems to be at least as much of a “how to do X” text as it is a “when not to do A, B, and C (and why)” text (although @f2harrell can refute this view :slightly_smiling_face:). The latter – knowing, understanding, and being able to identify pitfalls/issues with certain approaches – is where I believe a strong foundation in (some of) the theory comes in. It is also I think at least as important as knowing which procedures/methods to use; e.g., in my earlier days as a reviewer my critical appraisals of manuscripts’ Statistical Methods section weren’t as good as they are now.

And thank you for sharing those articles. I again agree that reading about the history of science is important and I generally try to follow the path from, say, probability’s development from being used for gambling purposes to the current state of affairs.

A good retelling of probability’s development from the 17th century gambling settings to modern statistical analysis is given by Herbert Weisberg’s Willful Ignorance: The Mismeasure of Uncertainty (Wiley, 2014). Weisberg is a winner of the ASA Statistical Consulting award, but the title of his book makes clear that his perspective is very critical.

An even stronger case against what I call the Bernoulli-Fisher ideology (the belief/wish that empirical data can usefully be modeled using the mathematics of games of chance) is given by John Kay and Mervyn King, Radical Uncertainty: Decision Making Beyond the Numbers (W. W. Norton, 2020). Both Weisberg and K^2 remind us that 100 years ago, the economists Frank Knight and John Maynard Keynes emphasized the distinction between what K^2 call “resovable” uncertainty and “radical” uncertainty. Borrowing terms from Jimmie Savage, they argue that in “small world” problems- those that can be usefully modeled using the stochastic methods of statisticians - most of the uncertainty is resolvable. These are problems where the probabilty space can be defined - in principle, there is a set of all possible outcomes, and a CDF can be imagined for them, even if its parameters or even shape is unknown. In “large world” problems, the part of uncertainty captured by statistical model inferences is only a fraction (of unknown magnitude) of the total uncertainty, which cannot be quantified. And most of us live in (and analyze data from) the “large world”. I can’t make the argument as eloquently as they can, but the bottom line is that much of the time “we” (statisticians and users of their methods) are commiting Nassim Nicholas Taleb’s “ludic fallacy”. For example, consider time series forecasting of the price of Nickel on the London Metal Exchange. In March of this year, the LME halted Nickel trading for over a week - and canceled the previous 8 hours of trading! Perhaps you could have forecast the halt (though, no such halt appears in the training data - the LME has been criticized for lacking the guardrails that other financial markets have) but not the cancellation of $3.9 Billion of uncoerced trading.

A wholly different line of attack on the Bernoulli-Fisher ideology is given by Sander Greenland’s lectures that I linked above, and I won’t try to recapitulate his ideas here. (Incidentally my made-up term, “Bernouli-Fisher ideology”, is deeply unfair to both Bernoulli and Fisher, who never foresaw the excesses that the profession that honors them has committed in their wake.)

These arguments apply equally to frequentist, Bayesian, & likelihoodist inferential frameworks. (Remember, the likelihood principle has an important caveat: “given the model”.) Thus the eternal disputes among the schools of inference are, in my view, a massive and pointless distraction from the real decision - is it even reasonable to entertain stochastic modeling for a given data set? In most cases my personal answer is that such models are useful as descriptive and exploratory tools, but the inferences derived from such models are delusional. And thus to cycle back to the original question, statistical theory is not as useful to me as knowledge of the history of science, engineering, and medicine (especially the examples that did not use “statistics” - a few of which are discussed in the below links).

Some of my more polished views on several of these points appear elsewhere:
https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2017.01057.x

4 Likes

As I prepare to take one of the entry level actuarial exams later in the year (ie. Probability) I wonder where these professionals would fit in? A significant amount of effort is devoted to modelling those uncertanties via extreme value theory, for example. Because the organizations that employ them bear the cost of errors when setting prices, the silliness seen in a typical social science applications are absent.

They would epitomize what Taleb and Harry Crane describe as having “skin in the game.” Crane in particular has proposed the radical notion that reported probabilities from models aren’t “real”, unless someone who makes such a claim is willing to make a wager at those implied odds. When looked at from an information theoretic standpoint, this is only taking JL Kelly’s 1956 paper “An Alternative Interpretation of the Information Rate” literally. I find substantial value in that point of view.

Crane, Harry (2018) The Fundamental Principle of Probability (preprint).

Crane, Harry (2018) Is Statistics Meeting the Needs of Science (preprint)

Crane, Harry (2020) Naive Probabilism (preprint)

Shafer, Glenn (2021) Testing by betting: A strategy for statistical and scientific communication (link)

1 Like

Thanks for replying and for suggesting Weisberg’s book; it looks quite interesting. Speaking of probability/statistics texts, might you (or any other readers, of course) have any thoughts on some of these:

  • Jaynes’ probability text, which I mentioned in the first post. From what I have seen it is quite a good book but some consider it controversial.
  • Savage: The Foundations of Statistics (1972) – your mention of Savage reminded me again to look into this book. It seems to be well regarded, and being a Dover book is quite affordable.
  • Clayton: Bernoulli’s Fallacy (2021) – I don’t recall where I learned of this book (perhaps The American Mathematical Monthly). If you haven’t already, you can read EJ Wagenmaker’s review which, by itself, may have convinced me to purchase this text.

There are many other books that look interesting to me but I would end up spending too much time trying to find books instead of actually reading and studying probability & statistics!

P.S. I had previously read the paper you posted from The American Statistician but shall revisit it (along with other articles in the special issue). Very thought-provoking, and your list of references is impressive. There are many papers you cite that look quite worthwhile. Thanks for your helpful posts!

1 Like

Thanks for the suggestion (I meant to reply sooner); I have heard good things about Ross’s text. Are you referring to Introduction to Probability Models, specifically? He also has a textbook called A First Course In Probability (and Introduction to Probability and Statistics for Scientists and Engineers, although I assume you weren’t thinking of this one).