Statistics Reform

Ellie Murray once said something on Twitter like “If the goal is to teach, that’s homework, not research.” I very much agree that encouragement/requirements for students and trainees to produce a published article (or more…) has a lot of negative effects. For one, it creates a lot of bad papers, and it also gets them started with a lot of bad habits so even as they advance in their careers they’re often still working on pretty bad papers.

The reasoning is sound; the execution is likely to be the challenge. As Sameera said…


Haven’t been following really closely here, but saw this and seemed germane…

Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature 2019; 567 :305. doi:10.1038/d41586-019-00857-9


It didn’t take much as I work with an ED consultant and am regularly in the ED… I just needed the permission of the head of ED to mirror the consultant for a day. Motivation is simple - if I’m tasked with trying to help a clinician’s decision making I must first try and understand where they are “at”, what the barriers and difficulties they face are, and how data is being collected.


Very wise. Thank you!

there was a plea on twitter as follows:

It would help if more MDs and scientists would stand up against fake "peer-reviewed" papers and non-scientific publishers that spread false information. And by commenting on @pubpeer, although I would recommend anonymously :-)

— Elisabeth Bik (@MicrobiomDigest) March 30, 2019

does anyone here use pubpeer? anonymously or otherwise? I have no experience with it. Is there much value a stato can provide aside from saying eg “use fewer decimal places when reporting p-values”?


She (Bik) had a cracking take on a paper here:


This is an important suggestion. Clinical decision making is often done under pressure of time, and requires weighing up not just the optimal management but also the consequences of getting it wrong. For something as simple as red, irritated eyes, the two likeliest causes – and infection and an allergic reaction – require treatments that will make the condition worse if you choose the wrong one.

Second, clinical decision making is sequential, with each step frequently determining what the next test should be, while statistical models are single-pass models requiring all information to be available at once.

My advice would be to sit in while a really good psychiatrist or psychotherapist conducts a diagnostic interview, to appreciate the incremental, branching structure of clinical decision making. About the most useful thing I learned in my career was how to conduct a formal psychiatric interview.

A failure to understand how decision making works in real life has left us with a mountain of logistic regression models that simply don’t meet the needs of our clinical colleagues.

I’m not saying that clinical decision making cannot be improved upon – I readily admit that it’s often bad, and spectacularly so in psychiatry (an area close to my heart). But we cannot help if we don’t understand what we are helping with.


How to improve statistical practice and training

Training focusses too much on Fisher (or Bayes, or regression, or deep-net) computations and has way, way too little about experimental design. Practice follows down this wrong path.

The quality of questions asked, survey method, and data definitions do not even enter into the calculations which are meant to verify whether a scientific work is valid or not.

How do I get grantors interested?

Better results are obviously worth investing in. It’s we who need to change.

If only the funds could come out of SAS-table-ridden SurveyMonkey b.s. that should never have ben funded in the past.

1 Like

Here is yet another example of how badly we are doing.

Despite considerable differences between the two trials, the outcomes were remarkably similar. … with a difference in overall survival of 8% in favor of chemotherapy (82% versus 74%, respectively; P = .008). Recurrence-free survival was also better for patients in the adjuvant chemotherapy arm than for patients in the no-adjuvant-chemotherapy arm (76% versus 65%, respectively; P = .001). Only when the ACTION trial was analyzed separately was a survival difference between the trial arms not seen ( P = .10).

The combined results of the two trials, which comprised 925 patients, would seem to be definitive proof of the benefit of platinum-based adjuvant chemotherapy for all patients with early-stage ovarian cancer. Unfortunately they are not. The ICON1 trial, with its broad patient entry criteria, included good-prognosis patients (i.e., 32% of patients had well-differentiated histology), who are normally excluded from such trials, and likely included poor-prognosis patients who may have had occult stage III disease; approximately 25% of patients with incompletely staged ovarian cancer harbor more advanced disease (5) .

In other words, standard practice entirely misses the point. P’s and differences in means are no better (therefore worse, given the increased complexity) than not doing any statistics at all. Moreover the excessive free parameters would make a patient just want to scan the data, in a pivot table, themself. If the relevant patient details were actually part of our procedures, rather than computationally ignored in the construction of a difference in means, what we do might have some validity. Unfortunately rather than aiding decisions or clarifying 900 observations down to 3, standard practice here amounts to gobbledegook.