There was an older thread that asked a very similar question. I pointed out that there are broadly 3 options, with the important point that the proposal for dealing with outliers needs to be done before data collection.
I find nothing wrong with using robust methods. One consideration has limited their use is Thomas O’Gorman’s [1,2] observation that:
…M-estimators and R-estimators, which tend to downweight observations that are outliers, appear to be rarely used in practice. One problem with these robust estimators is that there are too many robust approaches, functions, and parameters combined with too little guidance on their correct choice [1, p.12].
He developed methods of adaptive testing and estimation procedures that peek at the data in order to adjust the weights of the observations (much as would be done in a L, M, or R-estimation), so that the model for the assumed data generation process more closely matches the actual data without causing \alpha to increase.
He accomplish this by a permutation method on the residuals, which can be proven to maintain the \alpha level. This results in a valid procedure from a frequentist point of view.
In the scenario where the researcher’s statistical model (aka the null or reference model) for the data generation process is a good fit, little adjustment will take place, and the procedure will be very close to the standard textbook methods. In a scenario where the data are more likely to be from some mixture of distributions, those data points which appear unlikely under the investigator’s statistical model for the experiment, are given less weight, which increases power.
It would be unwise to uncritically use adaptive methods without considering whether the weighting method is hiding something more serious about the conduct of study and the credibility of the collected data. At the very least, you might discover a factor to control for in future studies that you did not consider in the initial design of the research plan that lead to the collection of a particular data set.
Before even collecting any data, I would perform simulation studies where I compare how the procedure performs when the model is grossly in error, vs tolerable deviations from assumptions. You would be able to see the frequency and size of weighting adjustments under a range of scenarios, in order to quantify your uncertainty in the model, the data, and the analysis.
Addendum: The original poster asks below:
In addition to O’Gorman’s observation I mentioned above, the short, less charitable answer is: much of the literature is littered with bad statistical methods permitted by subject matter experts with meager statistical training.
As a statistician, I have been in the situation where I was a co-author, and we submitted to a Q2 journal. The (clinical) reviewer asked we do stats atrocities, even though I explained to them why it is a bad idea (with references). The reviewer insisted (e.g. literally “please show me p-values for the normality tests”) without any rationale or counterarguments. I had to abide so as not to waste everybody’s time.
My current statistical knowledge mostly comes from hanging out here (since about 2019!) and trying to reverse engineer the reasoning behind Frank’s recommendations.
There was another older thread on multiple linear regression where I made the following observation:
It is hard to beat the simplicity of Dr. Harrell’s recommendation of semi-parametric modelling as a general rule. I’m pretty surprised at how much of applied stats can be compressed into proportional odds models. This strikes me as something only someone with decades of applied stats experience would figure out.
I merely emphasize O’Gorman’s work on adaptive testing and estimation because it explicitly demonstrates sound statistical thinking, especially in the contexts where naive parametric modelling dominates. There have been threads where even the proper use of proportional odds models have come under excessive scrutiny from subject matter experts. For example:
Uddin M, Bashir NZ, Kahan BC. Evaluating whether the proportional odds models to analyse ordinal outcomes in COVID-19 clinical trials is providing clinically interpretable treatment effects: A systematic review. Clinical Trials. 2023;21(3):363-370. doi:10.1177/17407745231211272
You can see an analysis by @f2harrell of this misplaced criticism here:
I particularly appreciate O’Gorman’s proof in [1], where he demonstrates the ability of \alpha to be maintained in an adaptive test via a reference to the nonparametric Hogg, Fisher, Randal procedure.
Ultimately, serious consideration needs to be given whether a parametric or semi-parametric approach should be taken towards the data. For many clinical outcomes scenarios, the proportional odds models is the preferred approach, despite the common use of parametric methods in the literature.
For scenarios where parametric modelling is reasonable and expected, O’Gorman’s adaptive procedures are worth considering, even if the proportional odds model would give a reasonable answer, simply because:
- It will be easier to defend to reviewers,
- There will be a small power advantage to parametric models over semi-parametric models when parametric assumptions are justified.
References
-
O’Gorman, T. W. (2004). Applied adaptive statistical methods: tests of significance and confidence intervals. Society for Industrial and Applied Mathematics.
-
O’Gorman, T. W. (2012). Adaptive tests of significance using permutations of residuals with R and SAS. John Wiley & Sons.