I agree with pretty much all that has been suggested here, but I’m pessimistic about the actual adoption. The reason is that unless there is pressure for proper statistical thinking and principled data analysis, no one will bother. If we can publish in vanity journals as it is now, why would PIs, faculties, institutes, etc spend the money on building statistical knowhow locally? The entire publishing industry does not reward investment in statistical understanding.
I think that the way to solve the problem passes by first putting pressure on journals policies to enforce the following:
- pre-registration for all studies. This goes both for hypothesis-driven research (observational or experimental) and data-driven/exploratory research. This should go some way into making clear to researchers that data analysis and study design are not separate things, as well as help avoid all the problems that have been documented for ages (HARKing, etc).
Even though it’s unrealistic to expect that all reviewers will have the necessary knowledge to evaluate the papers on a statistical level, it nevertheless makes the data analysis transparent (and post-publication criticism remains, as always, a possibility).
- availability of full raw data and analysis pipeline for all papers. Until every journal out there demands that every manuscript submission has to be accompanied by all necessary files for replicating the results (e.g. RMarkdown files, Jupyter scripts, or any kind of usable script, together with all the necessary raw data files to run the analysis), there will be little pressure to do data analysis in a principled, transparent way.
Without these measures in place, there is simply no pressure for research groups to change their bad practices. Many are not even aware of the bad practices they indulge in! I think this is the only possible way to impart on research groups, faculties, institutes, etc, that doing statistics as an afterthought won’t work, and that investing on properly trained statisticians/data analysts is not an option but a requirement going forward. It also goes a long way in making authors and institutions aware of the current problems research faces. If this pressure does not exist, no amount of opinion articles will change current practices. So the bottom line is that what we as a community need to focus on is on forcing change on journal policies. Faculty committees, department heads, etc are not interested otherwise, they just want their impact factors and funding money.
A final note on the statistical training of students/researchers. As has been mentioned already, the stats training that most students/researchers go through is inadequate. They don’t understand that designing a study requires statistics as much as specific field expertise. They think that if the study makes sense conceptually as far as their specific field is concerned, than it’s all good, and the only thing they need is “someone” to analyse the data afterwards. The basic statistical training offered in most colleges/universities also tends to spend too much time on theoretical ruminations on the CLT and t-tests, as opposed to taking a more practical approach e.g. using programming languages, emphasising learning through simulation, training in study design, etc. And focusing on “tests” doesn’t help either. In my experience, the classical approach of teaching the first courses on statistics as mathematical theory makes sense conceptually but has had disastrous consequences in practice. They learn little, and will promptly forget even that. Getting deeper into theory is important, but needs to come after you already captured their interest in a couple of basic courses. Show them correlations. Make predictions. Introduce regression right on the first course. Give them a taste of Bayes. Set them up for success. Get them doing “stuff”, and teach them good practices as you go.