I brought this up after Day 1’s lecture but I thought it might warrant further discussion/elaboration. (Apologies for the length!)
I am of the opinion that a “strong” background/foundation in theory is important to becoming a good applied statistician. I leave “strong” in quotation marks to be purposefully vague, as I don’t think it can really be quantified. The concept may be related to “mathematical maturity” or “developing statistical intuition”. (I welcome contrasting viewpoints of course!)
I came to this opinion when I had to plan a study (design, sample size, etc.) for work. A statistical consultant had given a very basic assessment without even looking at the small amount of data we had, recommending paired t-tests and using some standard sample size formula. However, I looked at the data and noticed a large proportion of 0 values with the positive values fairly strongly right-skewed. This led me to learn about zero-inflation, hurdle models, and the Tweedie family. Furthermore, a paired t-test was not appropriate as there were repeated measures per subject and experimental condition, so I started to read about generalized linear mixed models (GLMM). I recommended a pilot study with what I considered to be an appropriate design and used that data to write simulations for power/sample size purposes (and for some exploratory analysis). Along the way, I ran into many issues and read a lot on StackExchange and attempted to understand Douglas Bates’ unpublished lme4 text. While I could write code to analyze and interpret data, I felt that I was just “following instructions” (or best practices), without really knowing what I was doing. For example, I found variance components, conditional modes or best linear unbiased predictors (BLUPs), nested/crossed effects, etc. to be much more difficult than working with GLMs.
I asked @f2harrell if he thought understanding a text such as Casella & Berger (2001): Statistical Inference is a necessary or beneficial foundation. I used this text as an example because it is usually an “intermediate level” first-year graduate course for Statistics programs and is more challenging than texts such as “Wackerley, Mendenhall, & Scheaffer (2008): Mathematical Statistics with Applications”, or Hogg, McKean, & Craig (2018): Introduction to Mathematical Statistics, or Larsen & Marx (2017): Introduction to Mathematical Statistics and Its Applications (which I have read in its entirety). None of the above require knowledge of measure theory.
To @f2harrell, @Drew_Levy, and any other working statisticians: is my opinion/view to restrictive? For example, while I think a strong foundation in linear algebra is a great help when you get to (generalized) linear models, we are of course not writing our own code to do QR decompositions and so forth. Additionally, are there any textbooks you would recommend to develop statistical understanding beyond a “cookbook level” (i.e., just following steps/best practices)? Bayes-focused texts are welcome.
A newer introductory/intermediate text with more focus on Bayes and computer methods is Efron & Tibshirani (2016): Computer Age Statistical Inference. I have heard good things about Jaynes (2003): Probability Theory: The Logic of Science (PDF warning) for Bayesian understanding, and Cosma Shalizi’s Advanced Data Analysis From an Elementary Point of View for anyone doing data analysis.