Step-by-step topics to "start" in statistics

rleitepacheco · January 4, 2022, 8:29pm

I’m a medical doctor and researcher working in clinical settings/universities.

Medical/biomedical students often ask me for recommendations to “really” start studying statistics. Some of them are interested in pursuing a post-graduation career in statistics and applied data analysis.
These students have little experience in linear algebra and almost no knowledge of calculus and programming.

A few years ago I would recommend an “introductory” statistics book. I feel now that most non-mathematical books are heavily focused on hypothesis testing and not a great place to start. For instance, most of these books “end” in linear regression models after chapters and chapters of parametric and non-parametric tests.

Assuming that these students have access to any resource (e.g. they could take mathematics courses/semesters at the university), what would be the recommended courses/topics/resources to begin?

Thank you in advance.

R_cubed · January 4, 2022, 8:49pm

To emphasize the essentials of applied stats, and not get mislead by the literature (what is in the literature is often not optimal) there are 4 books I recommend, because I’ve learned a lot from them.

Biostatistics for Biomedical Research (link) by Frank Harrell – A good complement to any intro level textbook.
Resampling Methods by Philip I. Good. A great applied text on modern frequentist statistics that substitutes computation for mathematical theory. There is a trend in intro statistics education to emphasize simulation methods, and link them to the classical parametric models to improve intuition.
Permutation, Parametric, and Bootstrap Tests of Hypotheses by Philip I. Good. This is a more theory intensive text that complements Resampling Methods.
Regression Modelling Strategies by by Frank Harrell – hard to go wrong here, as he emphasizes semi-parametric models, which are very flexible as a general rule. The big advantage is that he has filtered out the large number of techniques (which have value in particular contexts) in favor of an approach that is reasonable in almost any context. No one who understands statistics could fault you for properly using his approach. This might be tough to tackle as a first text, but having studied 1 or 2 will make the student well prepared.

Bayesian texts require a bit more math. I’ll post a few recommendations later.

s_doi · January 4, 2022, 8:53pm

I think we have to make a distinction between medical students and other biomedical students

For medical students we are aiming as the first step to make them users of the literature and not creators of the literature and thus a robust course in EBM that covers intuitive understanding of statistical concepts is what is needed. They need to understand key concepts in both epidemiology and biostatistics and understand their meaning well but do not necessarily need an advanced mathematics and calculus course. For example they should understand clearly what is a P value from a test and what it means and how best to interpret it but not necessarily what its mathematical backdrop is - that is a useful skill for a physician researcher who can then do a quantitative Masters or PhD but not a medical student

rleitepacheco · January 4, 2022, 9:45pm

Thank you for the response.
I’m sorry if I wasn’t clear.
These students are usually very engaged in clinical epidemiology/EBM/research methodology topics. It is through courses, disciplines and other initiatives that I meet them.
I think they are the ones that want to take a step further to be able in the future to perform analysis. Maybe, the first step before they start any masters in the field.

EpiLearneR · October 3, 2022, 4:36pm

Hi
Please post your Bayesian recommendations. It would be helpful

R_cubed · October 9, 2022, 2:56pm

I think it is more useful to understand the relation between frequentist and Bayesian perspectives, so that when we are inevitably given a frequentist estimate or test result, we will interpret it appropriately.

The math to justify this isn’t terribly complicated; much can be done with algebra and\or calculus.

To that end, I’ve found the following helpful.

NISS Webinar on p-values with James Berger, Sander Greenland, and Robert Matthews. Berger describes the relation between Bayesian updating and frequentist error probabilities. Matthews reverses the typical Bayesian method to derive a credible or skeptical prior to compare and contrast observed results with prior information. This makes it clear that a Bayesian prior can be re-interpreted as a frequentist shrinkage device.

A published version of the procedure described by Berger in the video is here:

Robert Matthews (and more recently with Leonhard Held) have elaborated on the Bayesian Analysis of Credibility in a number of papers. I mentioned 2 in this thread.

After study of the papers above, I think there is value in considering the arguments of Michael Evans, who makes explicit a Bayesian derived approach known as Relative Belief. Compare his formulas with those in Berger’s Bayesian Rejection Ratio paper and presentation.

As a matter of philosophy of statistics, I think there is a lot to agree with in Evan’s paper. But in practice, a sound philosophy won’t help you compute a plausible solution to a concrete problem. A good frequentist estimate can often be used in place of a Bayesian Posterior when it is too difficult to compute.

The math needed to justify that is complex, but at the frontiers of statistical theory, a lot of interesting results are coming from the perspective that examines the relationships among Bayesian, Frequentist, and Fiducial procedures. [See the old paper by Hjort, added below] This research program has a cute acronym: Bayes, Frequentist, Fiducial (BFF) Best Friends Forever (?). The following is an interesting paper (but very technical).

Efron has an interesting paper on the relationship between frequentist bootstrap procedures and Bayesian posteriors, that is a bit easier to comprehend.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3703677/

Other articles on the relation between bootstrap and posterior distributions

Newton, M.A. and Raftery, A.E. (1994), Approximate Bayesian Inference with the Weighted Likelihood Bootstrap. Journal of the Royal Statistical Society: Series B (Methodological), 56: 3-26. https://doi.org/10.1111/j.2517-6161.1994.tb01956.x

Newton, M.A., Polson, N.G. and Xu, J. (2021), Weighted Bayesian bootstrap for scalable posterior distributions. Can J Statistics, 49: 421-437. https://doi.org/10.1002/cjs.11570

Hjort, N. L. (1991). Bayesian and empirical Bayesian bootstrapping. Preprint series. Statistical Research Report https://www.duo.uio.no/bitstream/handle/10852/47760/1/1991-9.pdf