Is it sensible to consider meta-analysis the strongest level of evidence for or against an intervention?

i think the “hierarchy of evidence” is essential for pushing back against this new cynicism re rct’s:

Otherwise we have people perversely demanding lower quality evidence for the sake of “ethics”

but i completely agree re the value of prospective meta-analysis where studies are designed with the intention of later combining them


As alluded, this has become a bit of a cottage industry for people looking to score publications.

Some journals also like to publish meta-analyses because they know they’ll be cited, goosing their impact factor.

I’m not aware of any publications on “when to do a meta-analysis” but I have seen some published meta-analyses that were clearly ridiculous. It’s not just about the number of trials, IMO. Once I saw a meta-analysis of clinical outcomes in Phase 2 clinical trials of PCSK9 inhibitors that included, I think, 24 trials. Something like 19 of the 24 trials had zero events - of course they did, the trials were all short-term dose exploration studies, mostly following patients for 6-12 weeks! None of them were designed to follow patients long enough to accumulate clinical events!


Great points, Andrew!

Although it is often labelled as being the best form of evidence, and in principle it should be, I have come to doubt it, There are just too many ways to select what goes into the meta-analysis, and what’s excluded. And in many fields there are lot of small trials that show marginal improvements for some intervention: after meta-analysis the result can look quite certain, when all we are looking at is false positives and publication bias.


The evolution of my thinking has been similar. Thank you for these comments.

1 Like

FWIW, I teach meta-analysis to our interns, so you may see my influence on the wards. To me, meta-analysis is a type of review – it summarizes what evidence exists. How many studies? How large? How good are the methods? But FWIW I definitely disagree with the standard hierarchy here. Poor trials have bias, averaging biased with unbiased studies puts tight confidence intervals around partially-biased results. It’s been shown a lot that simple, common flaws like unclear randomization procedures and uncertain blinding quality are associated with flawed results. Add in publication bias, etc, and I find the summary measure pretty iffy. If your point is the absolute result, I like what the IPDMA Collaboratives do, by just summarizing the bigger, better trials. Ioannidis did some work on this a long time ago, but I believe the data-based verification was uncertain.


I believe that the strength of evidence depends on the specific evidence available for a specific issue. I believe that a single axis/trait is insufficient–and that hierarchies in evidence based medicine are therefore insufficient. I think Blunt’s thesis here: raises a number of good points. I rather liked Whilhelm’s “Consilience of Inductions”, for example.

1 Like

However, I would have more faith is recommending treatment upon one large well conducted RCT, than 5 small studies.

There is an important argument for considering five small trials rather than one huge one.

Any trial is a trial of a protocol that has been operationalised around specific resources, in a specific situation. It is a test of the research question but it cannot be the test. We include many participants in clinical trials and epidemiological studies to see if the association we are interested in holds over what the BCP calls “all sorts and conditions of men”. For the same reason, we want to see if an effect is robust to methodology and setting – that it holds in different places and under different circumstances.

So the problem with a single large trial is that we don’t know how much the result is dependent on

  • the resources that were available to conduct it (large trials have dedicated staff, while you in outpatients have a registrar that has taken a study day)
  • the intensity of follow up – adverse events will be fewer and medication titration will be better in a large well-resourced trial with frequent follow-up
  • how the entry and exclusion criteria apply to the potentially eligible patient population – applying these criteria in a different patient population can result in quite a different casemix

I should terminate this list here, but you can see a whole host of other individual features of a single large trial that makes it misleading as a sole source of evidence.

Small studies have misfeatures that we are coming to recognise as more than just publication bias. They tend to be the early studies, so they are often run by zealots whose PhD hangs on the outcome, but also to use protocols that are inferior to later studies simply because they are a methodological learning exercise as well as a test of the research question.

If an effect holds up despite the diversity of methods and settings used to test it, I trust it more than a single highly-resourced study.

Perhaps the most important factor, though is sponsorship. There is no doubt that if industry pays the piper, they get their tune. If a meta-analysis doesn’t stratify by source of funding I’m deeply distrustful!


I’m inclined to agree with the skeptics regarding the idea of an “evidence hierarchy” placing “RCT/Meta Analyis” at the top. Real decision making is much more complex.

Having said that, @PaulBrownPhD makes a good point about the potential for failing to do rigorous outcome research when it is possible and should be done.

A thorough examination of meta-analysis from a logical point of view can be found here. :

Is meta-analysis the platinum standard of evidence?

I agree there can be too many subjective choices in meta-analysis, but I am not as negative as he is regarding the technique. Having said that, I think instead of looking at meta-analysis as “the answer”, it is better to look at it as “what is the range of scientifically justifiable difference of opinion?”

I haven’t seen this yet, but is anyone familiar with placing meta-analysis in a decision theoretic context, in terms of using Meta-analysis to produce informative priors for a later Bayesian experiment?

I agree with the views expressed previously, but I would like to add one more idea. Meta-analyses make a synthesis of studies with similar but not identical inclusion criteria. The populations of the studies included in the meta-analysis are therefore slightly heterogeneous. This is important for the applicability of the data to specific patients. With meta-analysis you cannot be so sure that the data are directly applicable to your particular patient, but rather express a ‘general’ result. On the other hand, with a well-designed RCT, with an adequate sample size, and with clear eligibility criteria, the results are directly applicable to a patient, on the sole condition that he or she is well represented in those eligibility criteria (e.g., mean age, no unusual risk factors, etc.).


i think people make too much of subjective choice re selection of studies; having a lot of studies to choose from does not seem like a terrible thing. Consider that you can completely eliminate the selection issue and the problems of meta-analysis become more apparent: i had a client who wanted to do a meta-analysis of two studies (no issue with selection because 2 studies was comprehensive!). It was a pointless exercise and i thought i’d talked them out of it, i refused to be involved (because there are a lot of these meta-analyses being produced). I later learned that they were trying to publish it after using some online calculator to do the analysis. That’s another problem with meta-analysis as the gold standard: everyone wants to do it and people make these simple tools available to them. Senn used to point out the problems with the cochrane software, whatever it was at the time, revman?

i was reading a bayes re-analysis yesterday that did this, it might not be the best example tho: bayes re-analysis, ECMO RCT. They say:

“data-derived prior distributions were developed based on relevant studies18-20 from a meta-analysis of ECMO for ARDS.21 The treatment effects in these previous studies were combined with the observed data from this trial in a Bayesian hierarchical random-effects model (that itself used minimally informative priors)”

they make too much of the convention of having artificial priors that are on the spectrum: enthusiastic - sceptical (shown in table 1). But i thinks this gets at the point you mention ie

1 Like

I thought the following BMJ Perspectives article that removes meta-analysis from the so-called “evidence hierarchy” would be interesting to readers of this topic:
The New Evidence Pyramid.

Therefore, the second modification to the pyramid is to remove systematic reviews from the top of the pyramid and use them as a lens through which other types of studies should be seen (ie, appraised and applied). The systematic review (the process of selecting the studies) and meta-analysis (the statistical aggregation that produces a single effect size) are tools to consume and apply the evidence by stakeholders.

That is a much more philosophically defensible position, than treating meta-analysis as comparable (or even superior) to primary data reports.

I hope we can quickly move from focus on a single effect size, towards a more useful idea of a range of distributions for the effect, so the formal Bayesian apparatus can be used to design experiments that will settle disputes to the satisfaction of the relevant parties.


completely agree. i saw ben goldacre speak recently and he kept saying meta-analysis is the highest evidence standard


There are a couple of papers on the topic, I’ll link just a few and they are very interesting ones

Shojania KG, Sampson M, Ansari MT, et al (2007) How Quickly Do Systematic Reviews Go Out of Date? A Survival Analysis. Annals of Internal Medicine 147:224.

Chung M, Newberry SJ, Ansari MT, et al (2012) Two methods provide similar signals for the need to update systematic reviews. J Clin Epidemiol 65:660–668.

Garner P, Hopewell S, Chandler J, et al (2016) When and how to update systematic reviews: consensus and checklist. BMJ 354:i3507.

IOANNIDIS JPA (2016) The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta‐analyses. Milbank Q 94:485–514.

Storz-Pfennig P (2017) Potentially unnecessary and wasteful clinical trial research detected in cumulative meta-epidemiological and trial sequential analysis. J Clin Epidemiol 82:61–70.


I’ve come across the Ioannidis paper (which I generally agree with), as well as the technique of Trial Sequential Analysis.

This method is not without its critics.

Should Cochrane apply error-adjustment methods when conducting repeated meta-analyses?

I think the arguments in the Cochrane paper have substantial merit. Trial Sequential Analysis adapts Frequentist techniques from sequential clinical trials (where past data affects future data collection) and tries to apply them to meta-analysis.

At least for a retrospective meta-analysis, this seems beside the point, if not entirely wrong. It also drags us back to the idea of hypothesis testing, which every informed researcher wants to get away from.

Here are 2 papers from one of the members of the Cochrane committee on some of the issues with it.

Sadly behind a paywall:

The more I read about trial sequential analysis, the more I had to agree with Dr. Harrell’s post on why he became a Bayesian.

Instead of trial sequential analysis, I’d much rather approach this from a Bayesian Decision Theoretic POV – synthesize whatever evidence is available, generate a range of plausible distributions (from skeptical to optimistic) that are constrained by the data, then decide on whether it is necessary to do a new experiment, or accept the evidence as it stands.

There is more than enough room for both Frequentist and Bayesian philosophies in medical science. But I think the Bayesian decision theoretic POV needs to be explained much more than it has been. There would be no dispute about “evidence hierarchies” if the mathematical results from decision theory were better known.


I really liked that BMJ Perspective. Thank you!

1 Like

Nicely articulated points on issues with any single trial. Do you agree that a single study run in multiple centers mitigates the concerns - a little, somewhat, or substantially?

Would you be able to point to some papers with these results?

There are some good links in this discussion from a philosophical POV:

I collected some references here:

The paper on Theory of Experimenters is likely your best starting point.

For a more pragmatic discussion, the following dissertation is worth close study if your field involves small sample research. With small samples, we need to think hard about the need for efficiency that algorithmic balancing provides, with the need for robustness that randomization provides. The author was advised by Dr. Harrell and Dr. Blume.

Chipman, Jonathan Joseph (2019) Sequential Rematched Randomization and Adaptive Monitoring with the Second-Generation p-Value to increase the efficiency and efficacy of Randomized Clinical Trials (link)

1 Like