Is it sensible to consider meta-analysis the strongest level of evidence for or against an intervention?

Some experts place meta-analysis of RCTs atop a theoretical ‘hierarchy of evidence,’ and I believe others instead place the best-designed RCT at the top position.

When might each of these ideas be reasonable, if this is indeed a meaningful way to think about things at all?


i don’t think so. The meta-analysis will be based on smaller (possibly poorer quality), older (possibly no longer entirely relevant) trials. There may also be additional biases which can creep in (such as selection bias). Combining the trials won’t eliminate bias; if the same systematic biases exist in each trial combining them will only reinforce the bias. Shapiro said: "the quality of the information yielded by [meta-analysis] cannot transcend the quality of the individual studies.”[ref] But there is some bad logic out there eg some have claimed that meta-analysis can be used to indicate whether further trials are warranted; and some have argued for larger trials to improve-meta-analysis! I don’t know if these opinions still exist though

edit: i wonder if anyone has written about “when to perform a meta-analysis”, because meta-analyses of two small studies are appearing (to generate publications perhaps) and statisticians are developing methodology for this


Thank you for this thoughtful reply. I have also seen many meta-analyses lately, and your point about defining when they are useful is an excellent one. Would love to see that explored, and that question really underlies my question above. Thanks!

Remember, Meta-analysis, when done correctly, is just a synthesis of current evidence as PaulBrown describes. It should state the current environment of the issue or intervention, changes in the field and gaps in knowledge. The Campbell and Cochran collaboratives have “ongoing” meta-analyses where, for important evidence that clinical societies regularly refer to, they essentially recreate the initial meta-analysis, state any changes in the literature and whether gaps have been addressed and how well.

To get at PaulBrown’s question, if its important about every 3-5 years, even for well established evidence (aspirin and cardiac risk for example).

Also, if the meta-analysis and review don’t follow PRISMA guidelines, don’t waste your time. Meta-analyses has become somewhat of a cottage industry of late and fodder for publication bloat.


I would argue against the concept of a hierarchy of evidence. All types of evidence exist on a continuum with notable overlap between different study types. For instance, a meta-analysis of 3 nearly identical studies is likely more reliable than any one of the studies. However, I would have more faith is recommending treatment upon one large well conducted RCT, than 5 small studies. Perhaps, the concept of a set hierarchy is actually leading us astray?


i think the “hierarchy of evidence” is essential for pushing back against this new cynicism re rct’s:

Otherwise we have people perversely demanding lower quality evidence for the sake of “ethics”

but i completely agree re the value of prospective meta-analysis where studies are designed with the intention of later combining them


As alluded, this has become a bit of a cottage industry for people looking to score publications.

Some journals also like to publish meta-analyses because they know they’ll be cited, goosing their impact factor.

I’m not aware of any publications on “when to do a meta-analysis” but I have seen some published meta-analyses that were clearly ridiculous. It’s not just about the number of trials, IMO. Once I saw a meta-analysis of clinical outcomes in Phase 2 clinical trials of PCSK9 inhibitors that included, I think, 24 trials. Something like 19 of the 24 trials had zero events - of course they did, the trials were all short-term dose exploration studies, mostly following patients for 6-12 weeks! None of them were designed to follow patients long enough to accumulate clinical events!


Great points, Andrew!

Although it is often labelled as being the best form of evidence, and in principle it should be, I have come to doubt it, There are just too many ways to select what goes into the meta-analysis, and what’s excluded. And in many fields there are lot of small trials that show marginal improvements for some intervention: after meta-analysis the result can look quite certain, when all we are looking at is false positives and publication bias.


The evolution of my thinking has been similar. Thank you for these comments.

1 Like

FWIW, I teach meta-analysis to our interns, so you may see my influence on the wards. To me, meta-analysis is a type of review – it summarizes what evidence exists. How many studies? How large? How good are the methods? But FWIW I definitely disagree with the standard hierarchy here. Poor trials have bias, averaging biased with unbiased studies puts tight confidence intervals around partially-biased results. It’s been shown a lot that simple, common flaws like unclear randomization procedures and uncertain blinding quality are associated with flawed results. Add in publication bias, etc, and I find the summary measure pretty iffy. If your point is the absolute result, I like what the IPDMA Collaboratives do, by just summarizing the bigger, better trials. Ioannidis did some work on this a long time ago, but I believe the data-based verification was uncertain.


I believe that the strength of evidence depends on the specific evidence available for a specific issue. I believe that a single axis/trait is insufficient–and that hierarchies in evidence based medicine are therefore insufficient. I think Blunt’s thesis here: raises a number of good points. I rather liked Whilhelm’s “Consilience of Inductions”, for example.

1 Like

However, I would have more faith is recommending treatment upon one large well conducted RCT, than 5 small studies.

There is an important argument for considering five small trials rather than one huge one.

Any trial is a trial of a protocol that has been operationalised around specific resources, in a specific situation. It is a test of the research question but it cannot be the test. We include many participants in clinical trials and epidemiological studies to see if the association we are interested in holds over what the BCP calls “all sorts and conditions of men”. For the same reason, we want to see if an effect is robust to methodology and setting – that it holds in different places and under different circumstances.

So the problem with a single large trial is that we don’t know how much the result is dependent on

  • the resources that were available to conduct it (large trials have dedicated staff, while you in outpatients have a registrar that has taken a study day)
  • the intensity of follow up – adverse events will be fewer and medication titration will be better in a large well-resourced trial with frequent follow-up
  • how the entry and exclusion criteria apply to the potentially eligible patient population – applying these criteria in a different patient population can result in quite a different casemix

I should terminate this list here, but you can see a whole host of other individual features of a single large trial that makes it misleading as a sole source of evidence.

Small studies have misfeatures that we are coming to recognise as more than just publication bias. They tend to be the early studies, so they are often run by zealots whose PhD hangs on the outcome, but also to use protocols that are inferior to later studies simply because they are a methodological learning exercise as well as a test of the research question.

If an effect holds up despite the diversity of methods and settings used to test it, I trust it more than a single highly-resourced study.

Perhaps the most important factor, though is sponsorship. There is no doubt that if industry pays the piper, they get their tune. If a meta-analysis doesn’t stratify by source of funding I’m deeply distrustful!


I’m inclined to agree with the skeptics regarding the idea of an “evidence hierarchy” placing “RCT/Meta Analyis” at the top. Real decision making is much more complex.

Having said that, @PaulBrownPhD makes a good point about the potential for failing to do rigorous outcome research when it is possible and should be done.

A thorough examination of meta-analysis from a logical point of view can be found here. :

Is meta-analysis the platinum standard of evidence?

I agree there can be too many subjective choices in meta-analysis, but I am not as negative as he is regarding the technique. Having said that, I think instead of looking at meta-analysis as “the answer”, it is better to look at it as “what is the range of scientifically justifiable difference of opinion?”

I haven’t seen this yet, but is anyone familiar with placing meta-analysis in a decision theoretic context, in terms of using Meta-analysis to produce informative priors for a later Bayesian experiment?

I agree with the views expressed previously, but I would like to add one more idea. Meta-analyses make a synthesis of studies with similar but not identical inclusion criteria. The populations of the studies included in the meta-analysis are therefore slightly heterogeneous. This is important for the applicability of the data to specific patients. With meta-analysis you cannot be so sure that the data are directly applicable to your particular patient, but rather express a ‘general’ result. On the other hand, with a well-designed RCT, with an adequate sample size, and with clear eligibility criteria, the results are directly applicable to a patient, on the sole condition that he or she is well represented in those eligibility criteria (e.g., mean age, no unusual risk factors, etc.).


i think people make too much of subjective choice re selection of studies; having a lot of studies to choose from does not seem like a terrible thing. Consider that you can completely eliminate the selection issue and the problems of meta-analysis become more apparent: i had a client who wanted to do a meta-analysis of two studies (no issue with selection because 2 studies was comprehensive!). It was a pointless exercise and i thought i’d talked them out of it, i refused to be involved (because there are a lot of these meta-analyses being produced). I later learned that they were trying to publish it after using some online calculator to do the analysis. That’s another problem with meta-analysis as the gold standard: everyone wants to do it and people make these simple tools available to them. Senn used to point out the problems with the cochrane software, whatever it was at the time, revman?

i was reading a bayes re-analysis yesterday that did this, it might not be the best example tho: bayes re-analysis, ECMO RCT. They say:

“data-derived prior distributions were developed based on relevant studies18-20 from a meta-analysis of ECMO for ARDS.21 The treatment effects in these previous studies were combined with the observed data from this trial in a Bayesian hierarchical random-effects model (that itself used minimally informative priors)”

they make too much of the convention of having artificial priors that are on the spectrum: enthusiastic - sceptical (shown in table 1). But i thinks this gets at the point you mention ie

1 Like

I thought the following BMJ Perspectives article that removes meta-analysis from the so-called “evidence hierarchy” would be interesting to readers of this topic:
The New Evidence Pyramid.

Therefore, the second modification to the pyramid is to remove systematic reviews from the top of the pyramid and use them as a lens through which other types of studies should be seen (ie, appraised and applied). The systematic review (the process of selecting the studies) and meta-analysis (the statistical aggregation that produces a single effect size) are tools to consume and apply the evidence by stakeholders.

That is a much more philosophically defensible position, than treating meta-analysis as comparable (or even superior) to primary data reports.

I hope we can quickly move from focus on a single effect size, towards a more useful idea of a range of distributions for the effect, so the formal Bayesian apparatus can be used to design experiments that will settle disputes to the satisfaction of the relevant parties.


completely agree. i saw ben goldacre speak recently and he kept saying meta-analysis is the highest evidence standard


There are a couple of papers on the topic, I’ll link just a few and they are very interesting ones

Shojania KG, Sampson M, Ansari MT, et al (2007) How Quickly Do Systematic Reviews Go Out of Date? A Survival Analysis. Annals of Internal Medicine 147:224.

Chung M, Newberry SJ, Ansari MT, et al (2012) Two methods provide similar signals for the need to update systematic reviews. J Clin Epidemiol 65:660–668.

Garner P, Hopewell S, Chandler J, et al (2016) When and how to update systematic reviews: consensus and checklist. BMJ 354:i3507.

IOANNIDIS JPA (2016) The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta‐analyses. Milbank Q 94:485–514.

Storz-Pfennig P (2017) Potentially unnecessary and wasteful clinical trial research detected in cumulative meta-epidemiological and trial sequential analysis. J Clin Epidemiol 82:61–70.