Textbook / curriculum recommendations for introductory and early statistics courses

Janet · May 19, 2023, 4:47pm

While taking the RMS course, and discussing so many things that are commonly done wrong in health science research statistics (i.e. most of what I learned when studying epidemiology!), I would love to hear thoughts on best textbooks/curriculums for introductory statistics courses.

f2harrell · May 19, 2023, 8:05pm

My favorite: The Analysis of Biological Data

And if you want something free: BBR

sscogges · May 19, 2023, 9:08pm

Richard Mcelreath’s “Statistical Rethinking”. There is a print book (Statistical Rethinking | Richard McElreath) as well as a set of free lectures (Statistical Rethinking 2023 - YouTube).

R_cubed · May 20, 2023, 10:11am

There was a similar thread awhile back. Aside from the materials already suggested, I think Philip I Good’s books from a resampling perspective would be a good place to start for those without the strongest mathematical background.

The material should be explored further from a decision theory perspective that links Bayesian with Frequentist ideas. The closest book that does this is (but is somewhat math heavy):

If there isn’t all that much time, the following papers are worth study, in that you can use your frequentist toolkit as procedures to generate inputs for a broader Bayesian perspective.

https://www.sciencedirect.com/science/article/pii/S002224961600002X
https://projecteuclid.org/journals/statistical-science/volume-24/issue-2/Relaxation-Penalties-and-Priors-for-Plausible-Modeling-of-Nonidentified-Bias/10.1214/09-STS291.full

Greenland, S. (2005). Multiple-bias modelling for analysis of observational data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 168: 267-306. link

Greenland, S. (2000), When Should Epidemiologic Regressions Use Random Coefficients?. Biometrics, 56: 915-921. https://doi.org/10.1111/j.0006-341X.2000.00915.x

f2harrell · May 20, 2023, 11:13am

Those are excellent recommendations but perhaps not for an introductory course.

R_cubed · May 20, 2023, 11:36am

I agree; I think the frequentist resampling perspective gives useful tools to get work done without complete understanding, but my ideal is closer to @Sander Bayesian and Frequentist perspectives are needed as our problems get more complex and realistic.

Jose Bernardo proposed teaching Bayesian concepts and Frequentist concepts together based on the notion of the prior as missing information. But I know of no recent texts taking that approach.

Bernardo, J. M. (2006). A Bayesian mathematical statistics primer. In Proceedings of the Seventh International Conference on Teaching Statistics. Salvador (Bahia): CD ROM. International Association for Statistical Education. link

From the abstract:

Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist statis-tics. It is argued that it may be appropriate to reverse this procedure. Indeed, the emergence of powerful objective Bayesian methods (where the result, as in frequentist statistics, only depends on the assumed model and the observed data), provides a new unifying perspective on most established methods, and may be used in situations (e.g. hierarchical structures) where frequentist methods cannot. On the other hand, frequentist procedures provide mechanisms to evaluate and calibrate any procedure. Hence, it may be the right time to consider an integrated approach to mathematical statistics, where objective Bayesian methods are first used to provide the building elements, and frequentist methods are then used to provide the necessary evaluation.

Berger, James & Bernardo, Jose & Sun, Dongchu. (2009). The formal definition of reference priors. The Annals of Statistics. 37. 10.1214/07-AOS587. link

Reference analysis produces objective Bayesian inference, in the sense that inferential statements depend only on the assumed model and the available data, and the prior distribution used to make an inference is least informative in a certain information-theoretic sense. Reference priors have been rigorously defined in specific contexts and heuristically defined in general, but a rigorous general definition has been lacking. We produce a rigorous general definition here and then show how an explicit expression for the reference prior can be obtained under very weak regularity conditions. The explicit expression can be used to derive new reference priors both analytically and numerically.

I consider this newer work by Bernardo and Berger a modern answer to the questions raised by Efron in this 1986 article.

Efron, B. (1986) Why Isn’t Everyone a Bayesian?, The American Statistician, 40:1, 1-5, DOI: https://doi.org/10.1080/00031305.1986.10475342

There is a version with commentaries which are also valuable. The comment by Herman Chernoff about using Fisherian tools in a broader Bayesian context makes sense to me.

Pavlos_Msaouel · May 20, 2023, 5:40pm

Getting hyped currently on social media for very good reason: Telling stories with data. This is the type of intro data science book I would recommend to all members of my lab and clinical data analysis teams.