Critical Appraisal Checklist for Statistical Methods


In Medical and Clinical Education, we frequently employ “Checklists” to help with critical appraisal of studies. Examples of such checklists are provided by numerous organizations includingBMJ, CASP, CEBM, JBI, and many others.

Here is an example of one from BMJ for RCTs:

One of the limitations of these checklists is the absence of any critical appraisal of statistical methods. I have searched in vain to find any “checklist” that would help students or clinicians with appraising statistical methods. In fact, as @ADAlthousePhD has suggested, this may be an impossible task without further education.

Is anyone aware of any tools or checklists that non-statisticians could employ to critically appraise statistical methods in a biomedical study? If a “checklist” is not the best approach, what alternatives would you suggest?


I think these exist and hope that others provide them here. At the Department of Biostatistics at Vanderbilt University we have tackled the easier problem of creating a checklist of statistical mistakes to avoid. It is here. Additions to that list are welcomed.


I’m surprised to see “Confidence intervals for key effects should always be included”.

A CI gives you a range of hypothetical population values that the data do not differ significantly from, right? For what kind of analysis is that useful information?

Or should I read the advice as: “Please include Confidence Intervals. They are numerically close enough to bayesian credible intervals and may be interpreted as such” ?



Nowadays I would word it this way. Include the entire Bayesian posterior distribution or at least Bayesian credible intervals. If these are not feasible, include confidence intervals (what Sander Greenland calls “compatibiity intervals” - a term I really like). Even though compatibiity intervals are indirect measures of uncertainty, they are still very useful and are definitely better than p-values. Many like standard errors, without reference to repeated sampling, as measures of precision, and confidence intervals are related to SEs for symmetric cases.

These issues are discussed in detail here.


Thank you Frank. Also for all the efforts to make this place thrive :bouquet:


Thanks Frank! This is definitely a great starting point.

I may try to use the references to create a simple checklist and post it here for feedback.


I’m sorry, Raj, I’m just a little confused by your question, because it seems that many such checklists do exist (EQUATOR network has them for basically every type of study), but perhaps the checklists do not cover quite what you’re hoping.

As I re-read your post, I think you are chasing something that will be very hard to produce - if you want a checklist that goes much deeper than high-level principles (items like those on the list you provided or those on Frank’s list, which is terrific and now bookmarked) it’s going to necessarily be much longer than a checklist.


Yup. Looking for something deeper, and agree it’s something hard to do - that’s why I’m hoping others can help!


There are three possible approaches:

  • Provide a list of errors to avoid (see above)
  • Attempt to provide a comprehensive list of best statistical practices (difficult if containing some level of detail, and will need perpetual updating, which web resources can assist with)
  • Provide a set of guiding principles that should be adhered to and can be fulfilled by a variety of statistical methods. Start with this.

Reporting guidelines have not accomplished any of this to date.


Is this what you are looking after?


Please note that the Equator Netowrk guidelines (including SAMPL) are meant as reporting guidelines not as methods for critical appraisal. That is, they (1) should be used by authors preparing manuscripts for submission so they can effectively communicate what they did, and (2) should be demanded by journal editors so they can better understand and review the manuscripts that are submitted. Various Equator Network guidelines exist for different study designs. They are highly recommended for the above two purposes. But they are not designed to guide critical appraisal of the literature. For that, the JAMA Users’ Guide and similar checklists from BMJ or CEBM should be referenced.

– Alexander.


As a complete outsider to statistics, I have made the appeal [on Twitter] that further education is needed to appraise statistical methods. Even statisticians [and physicians] can misinterpret statistics results. So I enthusiastically follow this thread. I don’t plan to become a statistician. But a very informed consumer of statistics and data science. Such education can be applied to many fields if pursued systematically and rigorously.