Statistics Reform

Hello all. I am starting to think about grant application around how to improve statistical practice. I have outlined some thoughts at the link below and just wanted to share them to generate a bit of discussion.

There are 3 key issues: How do we improve statistical support for researchers. How do we improve statistical training? How do we get funders to invest in these issues?


Include the idea of having biostatisticians round with physicians and observe in clinics so that they may ask after the fact about the decision-making of the physicians.


I think journals need to have higher expectations and that will impart more value on appropriate statistical methods. I also think we need to move away from PI initiated grants and think of team-based research questions that require equal contributions of various trained disciplines.


Awkward! :slight_smile:


I’m really happy to work where I do: Uppsala clinical research center’s biostat section has 16 statisticians and two data managers, working with almost any medical statistics there is, and under the same roof we have professional project managers, clinical data managers, medical writers, QA people, registry experts, and even people who (try to) understand GDPR. I’d never want to be “the” statistician at a department; some of these lonely statisticians pop by for coffee now and then.
Most importantly, I don’t think I’d do nearly as much good as a single statistician. Sharing and developing competence is one aspect. But also, applied statistics work is a lot about enforcing methodological professionalism, perhaps just insisting on spending 150 characters on reporting what was pre-defined and not, and to do that you need colleagues and the possibility to discuss and evolve standards.


Journal editors should have access to statistical support in the absence of them having adequate statistical training. Too much research is published that has not been adequately assessing from a statistical perspective.

I also think that grant proposals in science should include some element of statistical support or training. PIs should have to demonstrate in their application that they have sufficient aptitude to perform statistical analysis proposed or have access to a statistically trained person (I do not think think this needs to be a statistician as such)

1 Like

Agreed. I didn’t touch on this, but it would be important to connect any “embedded” statisticians, and I don’t actually see a reason they can’t still occupy the same space full time, for the reasons you outline. I think the main thing to get people to turn away from is the idea of a productive, sustainable “walk-in” consultancy.


Do you mean for a particular study? I would be against this outside of the context of a particular study for which the statisticians need a good understanding of end-of-the-bed decision making/data collection. The audience for a ward round is already intimidating enough for a patient without adding to it. That being said I think there is strong case for the folks who code EHR to spend a certain amount of time shadowing ward rounds and other clinical aspects of care to see how their software (doesn’t ) work in practice :innocent:

1 Like

Well put, Ollie. And I’m not saying this just because I’m one of the 16… :wink:. I once was “the lonely statistician” and I can assure you that being part of a group is much more stimulating. It is impossible to be expert on all topics so it is great to be part of a group with a broad range of statistical expertise, and practical experience, ranging from RCTs, RRCTs, registry studies, prediction modelling, biomarker research, -omics, big data, small data, and lots more.


Have you read the 2015 Wellcome/BBSRC/MRC/Acad Med Sci report into the replication crisis? They are certainly aware of the issues with statistics as it is practiced.

The recommendation ‘work with a statistician’ is sometimes given as a sort of panacea but I don’t think we can escape the fact that statisticians (Altman et al excepted of course) have been as much a part of the problem as others. I do agree the solutions to a large extent probably lie with there being more statisticians around, but also with teaching them better about the current issues (my own training didn’t mention p-hacking etc) and empowering through proper consultancy advocacy and leadership work.

Final thought, my current (biology) institute’s desire to create ‘standardised automated analysis pipelines’ gives me the shudders, and is probably opening a new front in this battle. There is 1 statistician (me) and about 100 people doing ‘bioinformatics’ largely without any statistical training (although this does vary, some are very good, some profess a phobia with stats that doesn’t seem to stop them turning handles). This is where the challenges for us are going to be as big data becomes more routine and PIs think their bioinformatics team will handle the numbers.


No, I hadn’t seen this. Many thanks.

i think the 1st and 3rd Qs are the same. I hear people comment all the time that they would love a biostat resource, but actually no one wants to pay for it.

The statistician is stretched thin, attention to detail, validation etc are compromised, and clients take the piss (link1, link2) eg repeated analyses on the same data. The statistician becomes exhausted and cynical, quality suffers etc.

here’s what i’d suggest to any stato: your potential client pool might be 100 researchers in the province (it’s more if you look for them); maybe 20 will provide a steady flow of work such that your standards won’t be adversely affected; thus charge the hourly rate that makes 80% of the researchers turn away from you. That is the rate you are worth and the remaining researchers are the ones who will value your contribution. You might be surprised to learn what you are worth…


I agree with pretty much all that has been suggested here, but I’m pessimistic about the actual adoption. The reason is that unless there is pressure for proper statistical thinking and principled data analysis, no one will bother. If we can publish in vanity journals as it is now, why would PIs, faculties, institutes, etc spend the money on building statistical knowhow locally? The entire publishing industry does not reward investment in statistical understanding.

I think that the way to solve the problem passes by first putting pressure on journals policies to enforce the following:

  1. pre-registration for all studies. This goes both for hypothesis-driven research (observational or experimental) and data-driven/exploratory research. This should go some way into making clear to researchers that data analysis and study design are not separate things, as well as help avoid all the problems that have been documented for ages (HARKing, etc).

Even though it’s unrealistic to expect that all reviewers will have the necessary knowledge to evaluate the papers on a statistical level, it nevertheless makes the data analysis transparent (and post-publication criticism remains, as always, a possibility).

  1. availability of full raw data and analysis pipeline for all papers. Until every journal out there demands that every manuscript submission has to be accompanied by all necessary files for replicating the results (e.g. RMarkdown files, Jupyter scripts, or any kind of usable script, together with all the necessary raw data files to run the analysis), there will be little pressure to do data analysis in a principled, transparent way.

Without these measures in place, there is simply no pressure for research groups to change their bad practices. Many are not even aware of the bad practices they indulge in! I think this is the only possible way to impart on research groups, faculties, institutes, etc, that doing statistics as an afterthought won’t work, and that investing on properly trained statisticians/data analysts is not an option but a requirement going forward. It also goes a long way in making authors and institutions aware of the current problems research faces. If this pressure does not exist, no amount of opinion articles will change current practices. So the bottom line is that what we as a community need to focus on is on forcing change on journal policies. Faculty committees, department heads, etc are not interested otherwise, they just want their impact factors and funding money.

A final note on the statistical training of students/researchers. As has been mentioned already, the stats training that most students/researchers go through is inadequate. They don’t understand that designing a study requires statistics as much as specific field expertise. They think that if the study makes sense conceptually as far as their specific field is concerned, than it’s all good, and the only thing they need is “someone” to analyse the data afterwards. The basic statistical training offered in most colleges/universities also tends to spend too much time on theoretical ruminations on the CLT and t-tests, as opposed to taking a more practical approach e.g. using programming languages, emphasising learning through simulation, training in study design, etc. And focusing on “tests” doesn’t help either. In my experience, the classical approach of teaching the first courses on statistics as mathematical theory makes sense conceptually but has had disastrous consequences in practice. They learn little, and will promptly forget even that. Getting deeper into theory is important, but needs to come after you already captured their interest in a couple of basic courses. Show them correlations. Make predictions. Introduce regression right on the first course. Give them a taste of Bayes. Set them up for success. Get them doing “stuff”, and teach them good practices as you go.


This is why I get puzzled that funders don’t pay for and insist on the need for this. I am convinced they want to pay for high quality research - so I can only conclude that they really don’t understand the deficits in necessary statistical thinking common across academia. Thanks for the thoughtful comments.


Thank you, and apologies if my comment felt a bit jumbled (English is not my native language).

Indeed I believe that funders honestly want to fund high quality research. The problem is that the funders understanding of statistics/data analysis/study design is the same as the average researcher’s. Same for journal editors, etc. Plus, funders want to see the research they fund translate into successfully published articles (I’ve experienced this myself, with funders imposing deadlines for presentation of results, article submission, etc), so the perverse incentives of the publishing system are felt even at the funding agencies’ level. This is all interconnected, and I believe the publishing system we have is the main driver of the sorry state of research and statistics education.

Thank you for listening.


Not sure if it is a problem of limited understanding. My bet is that this is because they have to report themselves how funded research resulted in x numbers of publications or patents to the government (or to the relevant maecenas if not publicly funded).


I think it’s both. On one hand, they don’t perceive the problems that we have been referring to. On the other hand, they don’t have any incentive to know any better because the way they measure successful funding policy is based on the same criteria that we use to measure academic performance: how many papers were published based on their funding, in what journals, etc.

Would the role of the (applied) statistician be more appreciated if we stopped teaching undergraduates to run tests themselves? Full stop. Maybe it’s misleading science graduates to believe it’s as simple as that/they can do it themselves/just ask a statistician once you have the data? Should we just focus on the philosophy and interpretation??

And then expecting them to carry out an individual research project/dissertation is not really communicating that research is a collaborative process… The whole set up of the typical BSc Hons degree is not in alignment with good research, is it?

Intrigued to find out what you put in the grant application - where do you even start with the whole broken system??


i agree. But i think, also, a large and important part of the statistician’s role is data handling and maintaining data integriy. And it is more likely that the clinician or student has done their t-test without error than it is that they have maintained the accuracy of the data (they love to fill excel spreadsheets with so many colours and formatting and have several versions of the file floating around etc). You don’t really ‘teach’ that, it is more personality and respect for data and an awareness of the likelihood and consequences of data mismanagement.


I found doing just this sort of thing incredibly valuable (In Emergency Departments and Intensive Care Units). I gained a greater appreciation of the physician’s tasks, the time constraints, the interactions that happen with nurses, radiologists, technicians etc etc etc. In particularly in the ED is the recognition that the physician is considering several possible diagnoses at once and that the end goal is not always to diagnose someone “there and then”, but to appropriately move them on (home or to get a scan or to a particular ward etc).