Hi,
(Applied) statisticians and knowledgeable consumers of statistical methods know that most (if not all) methods come at a price - they have assumptions.
Unfortunately not all consumers of statistical methods - or indeed their products! aka publications and “bold” media/press releases to the public (thinking of, but not restricted to, the tonnes of Covid-19 (pre-)publications being churned out at “less stringent” aka “accelerated publication track” peer-review processes out there:
*(1) Know about the assumptions of, or if they do. *
*(2) Know about how to test for & interpret those assumptions, and *
(3) Once tested, if they fail to meet those assumptions know where to go to next.
We, largely but not exclusively, have the point-and-click statistical software to thank for that!
Digression:
As you all know, there is an entire universe on product safety and labelling legal requirements in most parts of the world:
If you buy a bottle of wine (most of them at least), there will be a footnote telling you it contains sulphites etc.
If you buy a cigarette, you not only get an alphabetical warning, but a lovely graphical accompaniment.
If you get a drug, they come with information on side-effects etc.
If you buy a car, buy default safety features such as safety belt are a default instead of an added option.
However, if you run a statistical function/method e.g. cox/logistic regression, you are expected to also find your way (assuming that you can) on how to test for the assumptions.
In my (albeit naive) view, I consider that statistical methods to be the gatekeepers (i.e. even more important!) of safety (in terms of “novel findings”, "confirmatory/replication research"and “policy-changing studies” within research.
Statistical functions/methods 2.0?
Therefore, how about, inverting how we design and implement statistical products?: instead of the designers and programmers of statistical programs/functions passing on the burden of testing for model assumptions to their consumers, they take responsibility and test this as the DEFAULT, and then make it optional for consumers to turn this off - i.e. the consumer has to work a little harder to turn off this part of the statistical function output - increasing the chances of them having read the output in the first place as opposed to the other way round?
Yes, we can say that, “consumers of statistical methods or the products resulting thereof need to be aware of these conditions prior, or need to consult a statistician” but that is far removed from an ideal world - we have seen published (and luckily for some minority, retracted) papers that don’t even bother to mention whether they tested for model assumptions, yet lots of works build upon those findings (especially if they are the “first” or the “largest” or worse “first and largest” study to show an effect.
An extreme example might be to let industry decide to pass on the burden of “consumer be ware” to the customers, thus not being required to print on the packaging of their products “Cigarette/alcohol/drug might cause …” but rather expecting consumers to find these out for themselves or be aware of them or consult experts to tell them about that.
Of course, there is no silver bullet that can solve bad research, but this might be one way to mitigate this: no consumer should say (1) I did not know about the assumptions OR (2) I did not how to test for &/or interpret those assumptions.
Condition might be an optional recommendation to statistical software designers to implement i.e. (3) Once tested, if they fail to meet those assumptions know where to go to next.
What do you think? What other aspects could be improved on?
Sincerely,
nelly
DISCLAIMER:
I am just a junior medic, with limited (non-existent actually, compared to most people on this forum) statistical/programming knowledge & experience but with tremendous respect and admiration for the field of statistics and how it impacts our world and medical practice.