If you could fix one topic in Modern Statistics, what would it be?

f2harrell · December 14, 2025, 12:22pm

Along the lines of @ChristopherTong I think that the word validation is not an honest choice for the types of studies we are discussing. A true validation would validate that the losers are still losers, i.e., would validate the overall accuracy of the chosen list. And by the way the original study should derive the upper limit of predictive discrimination from all candidate features after using a good ridge regression type of shrinkage on them. This will help to justify or dismiss any parsimonious representation of the features. For example if the all-features R^2 is 0.25 but the R^2 achieved from one-feature-at-a-time selection is 0.06, feature selection is fruitless.

R_cubed · December 14, 2025, 2:44pm

I understand your rationale, and I see the merit in it. But what would you say to a principal investigator who counters by stating that: “Well, our priors based upon theoretical considerations make your fear of false negatives unpersuasive. We are much more concerned about false discovery, and validating the losers is a waste of resources.”

I suspect you would point out that the final ranking still has a large amount of uncertainty to it, and resampling could easily change the list “losers” vs. “winners”.

ChristopherTong · December 14, 2025, 10:16pm

Correct, Stage 2 uses different data than Stage 1.

(In light of Frank’s comment, I will change my terminology to Stages 1 and 2 instead of “discovery” and “validation” stages, which are the terms used within the field.)

f2harrell · December 14, 2025, 10:56pm

This is utterly bogus. Were that to be true the authors could always stop with N=2.

f2harrell · December 16, 2025, 10:33pm

To expand on that, and not trying to be as short-tempered as I was in the last response (sorry about that), to make judgments about features you need to be able to quantify their importance (e.g., log-likelihood explained, R^2, etc.). For the methods used in almost all GWAS the confidence intervals on variant importance are embarrassing wide. When you can’t judge the importance of features you can’t select features with any reliability.

R_cubed · December 17, 2025, 12:46am

No need to apologize for anything. What I’ve taken from your posts on the GWAS studies is that the combination of low power and the need to correct for multiple testing makes these investigations much less informative and challenging to do than they might appear, which even I underestimated.

That brings one question to my mind: what do you think has been learned from the GWAS studies, as they are currently conducted? It sounds like you aren’t terribly convinced they haven’t missed something, which is fair. But what about the findings that the procedure “validates” – do you see any reason for skepticism of them?

f2harrell · December 17, 2025, 2:43pm

The largest effects of GWAS is padding of CVs by way of authors getting to associate their names with a finding of a “significant” effect of a variant with an outcome/phenotype. This can be translated as “We oversimplified complex genetic pathways and found some individual variants that are significantly associated with an outcome. Never mind that R^{2} = 0.04 and that a true biologically-driven analysis that did not attempt to be parsimonious may have discovered a large gene set yielding R^{2}=0.3.”

Simultaneous inflation of both \alpha and \beta in GWAS is a problem, but at the heart of this kind of research is a ranking and selection problem. The amount of information needed to correctly select a feature set is astounding. GWAS has a very, very low probability of selecting the “right” variants.

A simple Bayesian procedure, in addition to examining widths of confidence intervals for importance rankings, fully exposes the problem. Let’s suppose you are interested in finding features that have a non-trivial association with an outcome, and you define non-trivial as odds ratio > 5/4 or < 4/5. Pick a reasonable skeptical prior and select variants that have P(OR > 5/4 or < 4/5) > 0.95 and rule out variants that have this probability as < 0.05. You’ll find that the huge number of variants will have P(4/5 < OR < 5/4) between 0.3 and 0.7, i.e., we just don’t know anything about their associations with outcome. This failure to admit that for the majority of variants we don’t know enough to write the paper is a major failing of the false discovery rate-based GWAS machine.

samw235711 · January 9, 2026, 11:36pm

I would like to see statistics taught early on in grade school, starting with probability as a conceptual way to describe intuitive problems—eg Jaynes starts with a police officer noticing that a jewelry store has been robbed. I would like to see decision problems at the forefront. This would be in contrast to how probability is often taught, focusing on the mathematical definitions needed to describe coins, card games, and such (definitely would like to see less focus on tests, which have so much large sample baggage that they are often impenetrable to someone without mathematical background, although admittedly it was thinking about a visual interpretation of a ttest that got me really interested in statistics, and I did not understand large sample theory at that time).

After learning about probability, one would then start discussing statistics as a way to estimate these probabilities. I would not start with the mean - I would start with distribution and then teach the mean as a function of it (i like the idea that EX=mu(P)).

Probability and statistics would then serve as a common language, perhaps helping people understand the world we live in and the technologies we have created, and in particular the connections between different techniques.

f2harrell · January 10, 2026, 12:50pm

Those are excellent ideas and I especially like to idea of first presenting realistic examples. Another concept that I think should be taught very early, as oddly as it sounds, is loess or some other nonparametric smoother to examine trends between two variables.