Is Any Method So Obsolete It Shouldn't Be Used?

With out a doubt, fixed level testing at the magical levels of 0.1, 0.05, 0.025, or 0.01 should be relegated to the historical dustbin.

That doesn’t imply any “ban” on testing, however.

It is responsible for the litany of p-value fallacies we see in the scientific journals. It is also responsible for a vast literature on the so-called “Jeffreys-Lindley” paradox. There is no paradox if a declines at a suitable rate as 1-\beta increases.

I’m exploring the relationship of this adaptive \alpha to the Bayesian rejection ratio described in this paper I’ve posted on a number of threads.

Making explicit the important distinction between Fisher’s p value and Neyman’s \alpha would go a long way to improving both the reporting of results as well as the interpretation.

Related to regression: I’d hope that clarifying the important distinction between \alpha and p would lead to understanding that leaving out terms in a regression model based on a decision rule interpretation of p-values is irrational from an information theoretic and Bayesian POV.

4 Likes

Other obsolete methods:

  • repeated measures ANOVA
  • Mood median test
  • sign test
  • any method that converts a continuous variable to a categorical variable
4 Likes

I am guilty of the last one

Another two obsolete methods are:

  • log-binomial regression
  • modified Poisson regression with binary outcomes
1 Like

Considering the Mood Median Tests.

So, why do various software packages (e.g., R, SAS, STATA, SPSS and even excel–if this is statistical software package) make this test an option. In my opinion, the option “invites” users to use this obsolete test. If this test is “obsolete”, why would R, SAS, STATA, SPSS (and excel) make it easy/possible/acceptable to conduct an analysis using the software.

This post was motivated by a review of a post from CROSS-VALIDATED where what seemed to be a student tried to do an analysis of data using stepwise regression and ended up with a mess. No surprise.

I am happy to know that the Mood median test is something I never need to understand or use. I never even heard of it. But, even in excel, there it is an option to conduct this test as a comparison of medians.

Deeming obsolescent the sign test would likely have many advocates and few detractors.

Deeming obsolescent repeated measures ANOVA should be a special priority for some fields- not limited to psychology, sociology, medicine, economics.

But how to “stomp out” things that are obsolete–regression approaches, tests–others.

Stepwise regression is being TAUGHT in introductory statistics courses.
The sign test is being TAUGHT in introductory statistics courses.
These methods have software that “invites” people to use the methods.

Stepwise regression, the Sign Test, the Mood Median Test (and others) could be taught as “these are methods that were developed in the past and advances in knowledge and technology have made them obsolete.”

2 Likes

I join you in ‘screaming into the void’ here. A crucial function that excellent statistical software serves is to provide expert guidance to the user. (If only Stata refused to compute post hoc power, at least one calamitous episode in the history of statistics could have been avoided!) At the very least, warnings should be issued when bad things are attempted. I once engaged with Stata Journal over Stata’s cheerfully computing cluster-robust variance estimators with as few as 2 clusters, in the context of a case where this happened, resulting in an erratum.

Even if ‘canceling’ bad methods from software is not feasible, I wonder whether one might overlay an advisory layer. It should be possible to write an R package that, when attached, effectively looks over your shoulder while you work, and offers the missing warnings. Presumably, this could be implemented simply by masking every problematic function, and imposing additional quality checks before passing through to the original function or refusing to proceed.

4 Likes

Great comments from both of you. I wish that software systems would put warnings every time an obsolete or highly problematic method is used. Or we need to teach everyone to do simulations or to dissect their results, which would expose the problems. Here was one of my attempts to expose the futility of feature/variable selection: https://discourse.datamethods.org/t/challenges-to-variable-selection

2 Likes

Yes, advisories and warnings and quality checks in the software seem first steps in the right direction.

But I can see from reading the many posts and writings of @f2harrell (and others) about (e.g.) stepwise regression that there is a feeling of “screaming in the wind.” Can’t hurt to talk about it.

But a clear statement that " can be done but shouldn’t be done (or doesn’t need to be done) because is obsolete" wouldn’t hurt. I have a washboard but I don’t need to use it because I have a washing machine, which does a better job. Washboards are obsolete.

2 Likes

I hope someone will take up a response to your post.

I think a great class could be developed by having students do simulations on the large number of flawed methods and proposals that litter the journals and textbooks. This could be done in an entirely inquiry-based fashion, after the essentials of programming the simlations were taught. Has anyone tried it?

3 Likes

I’d surely like to know. I think that simulations are great teaching tools, and having a compendium of R Markdown notebooks for helping to jump start them would be a good idea. This could include not only obsolete methods but non-robust methods, e.g., a simple simulation to show that the central limit theorem doesn’t apply.

1 Like
  • modified Poisson regression with binary outcomes

I assume you are referring to the methods described here. Is this really obsolete? Why? What is to be used instead?

1 Like

The idea of basing models on relative risks is not sensible to me as such models would require more interactions to be included to make up for model restrictions (restrictions that log odds don’t have).

1 Like

Yes, that method. As mentioned by Frank above, the problem is that this generates a relative risk which has a different numerical value with changes in baseline risk independent of outcome discrimination. Thus the coefficients are just a rolling number. In addition there will be unnecessary product terms required to keep probabilities bounded between 0,1

Best to use logistic regression and interpret the odds as risks as needed. There is a thread discussing this recently if you scroll down the list. Also see this paper

1 Like

Don’t say “interpret the odds as risks” but rather use the logistic regression model to get risks and use those covariate-specific risks to get covariate-specific risk ratios.

3 Likes

Agree - I was just thinking in terms of posterior odds from the logistic regression model

Frank, my turn to ask. Why repeated measures ANOVA? I could understand OLS being obsolete for analyzing repeated measures, but safety testing in preclinical studies or target animal studies are two fields where the MLE/REML method is currently the standard due to the split-plot in time experimental design of the studies, and Bayesian methods with different covariance structures are being explored. Thanks!

SteveDenham

The traditional repeated measures ANOVA that assumes multivariate normality with a very restrictive correlation patter is too restrictive.

1 Like

No kidding. My Linear Models professor in 1977 pointed that out. He also pointed out that no null hypothesis is true, that p values are some sort of random variable with an indeterminate distribution that is dependent on how much faith you can put into the prior supporting it and that someday in the future, we would ALL be using Bayesian and simulation methods. Hats off to Dr. Norm Matloff, at UC Davis then.

4 Likes

Newman-Keuls post-hoc test… doesn’t do what it says on the tin

1 Like

In addition to the restricted covariance structure assumed, RM-ANOVA can’t handle missing values (need to impute or drop cases), observations need to be at the same time points for all subjects, and results are not that informative (a significant treatment x time interaction, great). This paper has a nice discussion.

2 Likes