Random vs. fixed effects meta-analysis

I’m seeking clarity on when to use random effects vs. fixed effects for studies in a meta-analysis. Here’s what I’ve come to believe:

  • Fixed effects account for and absorb variability just like random effects
  • Fixed effects are needed if the number of clusters (studies) is small
  • Random effects models may misbehave if the number of studies is small
  • The distinction between fixed and random effects is more pronounced if you are allowing study x treatment interaction

Are these opinions accurate? What is best practice when you have 2 studies? 3? 4? 5 or more? What is the minimum number of studies before you would use random effects? Is a fixed-effects analysis reliable in any case?


I assume you are referring to a retrospective analysis of results published in journals. Please correct me if I am mistaken.

Your question reminded me of this article from 1999, published in the American Journal of Epidemiology by @Sander_Greenland and Charles Poole. It examines some of the considerations in conducting a meta-analysis that you mentioned above.

Their primary recommendations:

  1. If you are going to calculate any summary stat, compute both a fixed effect and a random effects estimate (unless you have a really good reason to prefer one or the other).
  2. If there is a meaningful difference between the two estimates, there is heterogeneity among the studies, and any summary statistic is likely to be misleading.
  3. If possible, use meta-regression methods to explore how study design affects the results.
  4. Random effects methods can be susceptible to plausible types of publication bias, making them less conservative than is ordinarily believed.
  5. Very small samples of studies make any quantitative method very limited. They would prefer narrative description in this scenario.

While I have no doubt you would be aware of the limitations of this type of meta-analysis, for completeness, I also recommend readers interested in this topic to study the @Sander_Greenland 2005 paper on multiple bias modelling for observational data (which would include meta-analysis by any reasonable definition). It is not only relevant to this topic, but it provides an excellent example of how to think like a good, skeptical statistician.


These and your summary are exceedingly helpful. Thanks!

Some of my favorite references on the topic by @Stephen include:

  1. Trying to be precise about vagueness
  2. Hans van Houwelingen and the art of summing up
  3. The Many Modes of Meta

In brief, fixed effect models assume that the treatment effects in the studies only differ by chance. They can thus be used to determine whether patients benefit from a treatment versus control and is therefore a natural extension of what is typically done with single trials.

Random effects assume that the treatment effects differ in the studies for reasons other than chance. They typically model this by assuming that the true treatment effects are drawn from a normal distribution. They can thus be used to determine what the treatment effect is in general, e.g., in the broader population from which the trial cohorts are sampled. This is a more ambitious question.

Based on the above, fixed effects do become less plausible the more studies we include in the meta-analysis.

1 Like

Sorry to have to dispute those commonly taught beliefs about fixed vs random effects, but I regard them as quite misleading because they only pertain to analyses without meta-covariate adjustment. Such analyses can be as misleading as single-study analyses that examine only average effects. Some of the meta-analysis literature has been seriously distorted by merging heterogeneous estimates using random effects instead of predictors, thus ignoring study-level covariates and clear signals of the causes of heterogeneity.

I think of random effects as at best a last resort when failing to explain heterogeneity across studies using measured study covariates. One first needs to seek predictors of differences using fixed-effect meta-regression, preferably in using a shared “natural parameter” that is rescaled to the same units across studies, e.g., change in log hazard ratio by average age of the treated groups in decades (if adults) or years (if children). Random effects enter as a residual only if needed after explanatory covariates are examined. It should be noted however that meta-regression is a form of ecologic (aggregate) regression when a covariate is a summary across individuals (such as average age) rather than a study property such as design type (e.g., randomized vs. observational cohort indicator). Furthermore, quality scores can be highly misleading as predictors due to their merging of disparate quality components.

As R-cubed noted, we discussed these issues long ago in
Greenland, S. (1994). A critical look at some popular meta-analytic methods. American Journal of Epidemiology, 140, 290-296,
Greenland, S. (1994). Quality scores are useless and potentially misleading. American Journal of Epidemiology, 140, 300-301,
Greenland, S. (1994). Can meta-analysis be salvaged?. American Journal of Epidemiology, 140, 783-787,
Poole, C., Greenland, S. (1999). Random-effects meta-analyses are not always conservative. American Journal of Epidemiology, 150, 469-475,
Greenland, S., O’Rourke, K. (2001). On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions. Biostatistics, 2, 463-471,
and the meta-analysis chapter of Modern Epidemiology 3rd ed. 2008 (Ch. 33, also with the late Keith O’Rourke).
We gave a method of fixed-effects meta-regression for dose-response analyses in
Greenland, S., Longnecker, M. P. (1992). Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. American Journal of Epidemiology, 135, 1301-1309,
which according to Google has been used quite a bit, thanks to its Stata implementation in
Orsini, N., Bellocco, R., Greenland, S. (2006). Generalized least squares for trend estimation of summarized dose-response data. The Stata Journal, 6, 40-57.


Well said. This is why individual patient data meta-analyses, where meta-covariate adjustment can be properly performed, are far preferable to pooled summary data.


Indeed. Those pooled analyses do however need to include study main effects to keep the studies distinct, and should examine products of treatment indicators with study-level covariates (such as indicators of design properties) to allow for across-study variations (“treatment-by-study interactions”).


Responding to Frank’s OP:

*Fixed effects account for and absorb variability just like random effects
The meaning of that is unclear to me. With no covariates, a random study effect is just a study-specific intercept drawn from an unknown “prior” distribution of study effects, and the usual fixed effect model is what you get assuming a zero-variance, unknown-location prior. A major problem is not only that too many RE meta-analyses ignore covariates, they also focus on the (empirical-Bayes) estimated posterior mean instead of the estimated study-specific effects.
*Fixed effects are needed if the number of clusters (studies) is small, and
*Random effects models may misbehave if the number of studies is small
It’s complicated. Exact Bayes RE solutions don’t care about K = no. of clusters, but some approximate methods (Bayes, semi-Bayes, and frequentist) are asymptotic in K, and all need at least K=3 if both the prior mean and variance are estimated (as in Stein estimation). Of course the simplest and most common approximation (DerSimonian-Laird, 1986) behaves the worst, but there are many more accurate ones now.
*The distinction between fixed and random effects is more pronounced if you are allowing study x treatment interaction
Again it’s unclear to me what you mean by that. Some clarification might come from recognizing that a fixed-effects model (with product terms or not) is an upper limit of simpler RE model and the lower limit of a more complex RE model. IJ Good elegantly laid out the relations in hierarchical/empirical Bayes terms in a letter:
Good I.J. (1987). Hierarchical Bayesian and empirical Bayesian methods. The American Statistician, 41, 92.
*What is the minimum number of studies before you would use random effects?
If K≤k don’t use a method that relies on K>k. And if the individual study estimates being merged are from studies with sparse count outcomes, don’t use a method that hinges on those estimates having Gaussian error (like simple inverse-variance weighted methods such as the usual fixed-effects and D-L random-effects methods).
*Is a fixed-effects analysis reliable in any case?
Did you mean “every case”? Regardless, it all comes down to the particular assumptions in the model and fitting method your using, just like in single-study regression. The main complication with meta-regression is that now you may have covariates that are group aggregates and thus the analysis will become vulnerable to ecologic biases of the sort explained in many places, e.g.,
Greenland, S. (2002). A review of multilevel theory for ecologic analyses. Statistics in Medicine, 21, 389-395.
Greenland, S. (2001). Ecologic versus individual-level sources of confounding in ecologic estimates of contextual health effects. International Journal of Epidemiology, 30, 1343-1350.


Three questions:

  1. What do you think the biggest errors are in how meta-analysis is presented to researchers in the textbooks and journals, and how would you present the topic if you were to teach it? While I understand the reason behind the common guideline for doing a synthesis based upon similar designs (ie. meta-analyses of RCTs only), I always thought that was misleading because it ignored potentially relevant information from studies of other types. Studies of the same type could also have a large degree of variation in how well various explanatory factors were controlled. The exploration of sources of heterogeneity would seem critical for anyone who wants to take value of information and design of experiments in light of the contextual background seriously.

  2. Do you have any guidelines on how to compute the needed sample size for meta-regression?

  3. Do you have any recent examples of a well done meta-analysis, aside from your own work?

  1. I’m not up on recent meta-analysis methods literature, but most meta-analyses I see (including most from the Cochrane Collaboration) use questionable selection criteria that risk selection bias, followed by primitive RE summarizations on what’s included using methods that are inefficient, inaccurate, and tend to hide patterns as discussed in the articles I cited above. For how I taught meta-analysis back in the day, see Ch. 33 of Modern Epidemiology 3rd ed. 2008; there was a lot on analysis of heterogeneity, not much on random effects.
  2. No guidelines on sample size in part because it’s so dependent on the methods used. As with single-study methods, the simplest most common methods break down the fastest; there are always alternatives that operate well enough but may not be in your software. From simulations I’ve seen those simple RE methods are unacceptable at or below K=5. Yet even with K=2 one can still statistically compare studies directly using the fixed-effects methods in the aforementioned Ch. 33.
  3. Here’s a recent one which seemed reasonable given the limited heterogeneity analysis (one binary covariate, treatment timing), which was likely due to the limited number of included trials (K=7 and K=4) - it also cites and uses within groups an RE method superior to D-L:
    García-Albéniz X. et al. (2022). Systematic review and meta-analysis of randomized trials of hydroxychloroquine for the prevention of COVID-19. European Journal of Epidemiology, 37, 789-796.

Wow Sander thank you so very, very much for taking that amount of time to share your thoughts and for providing literature citations! This is incredibly helpful. Thanks to others also for entering into this discussion.

For the case where individual patient-level data are not available (i.e., a traditional meta-analysis) it would be nice to have a guidance document that I could show to a clinical investigator. (Of course that is ignoring various issues that you hinted may be more important than any statistical modeling issues, such as the study selection process.)


Thanks Frank for the appreciation!

My recommendation for a document is the aforementioned meta-analysis chapter (Ch. 33 of Modern Epidemiology, 3rd edn, 2008). Copyright restrictions may apply, but perhaps physical copies of it may not be a practical problem given it is now 16 years old and a 4th edition without that chapter has been issued. R-cubed gave a nice quick bullet-point summary of recommendations, to which I would add
0. Meta-analyses are profoundly sensitive to study-selection bias, whether from publication bias or from biased post hoc inclusion criteria. Therefore all available studies should be cited and plotted even if they are excluded from the analysis, and study-specific reasons for exclusions should be provided (e.g., evidence of data fabrication; evidence of protocol violations; etc.).

Were I updating the chapter I would include more discussion of the study-selection issue. The use of ivermectin in covid-19 would be a great example in which there are many meta-analyses with inferences ranging from dramatically effective to completely ineffective. The conflict appears largely driven by what was chosen for inclusion out of the dozens of trials that have been documented.

Some of the trials (perhaps over 20%) turned out to be fraudulent or so poorly conducted or documented that any sensible analysis would exclude them; but the remainder is large and heterogeneous, especially across treatment timing. This means that one can find selection criteria with plausible rationales to produce just about any inference one wants. And indeed, there are many meta-analyses with inferences ranging from strongly beneficial to completely null, with the average around weak protection when given early enough (as with the HCQ example I cited above).

The lesson I see is that selection criteria can overwhelm all the statistical concerns we’ve discussed. Nonetheless, analysis methods do matter: For example, as usual some analyses were reported as negative because the wide RE 95% CI for the mean included the null; yet for at least one analysis my preferred fixed-effect CI would have excluded the null.


Extremely helpful again. My informal take on the Ivermectin literature, having been one of the lead statisticians on two Ivermectin RCTs, is that Ivermectin has a beneficial effect that happens to be very small, almost inconsequential. This matches a meta-analyses of RCTs we were not involved in. We need to do more inference on a clinical scale. In our case we compute a Bayesian posterior probability that patients recovered from COVID-19 more than 1/2 day quicker with Ivermectin than with placebo.

1 Like

Meta-analysis is a weighted average of study effect sizes reported from a number of studies that represent a conceptual replication regarding a specific hypothesis. The “models” of meta-analysis are all about generating an appropriate set of weights for this purpose of weighted averaging. The “model” needs to generate the most optimal set of weights that:
a) Result in less error (i.e. the model with the least MSE)
b) Result in good error estimation (nominal coverage of the CI)

(note that an unbiased estimator is not the goal in meta-analysis models but rather an estimator with the least MSE which can only be achieved with some increase in bias – there is a paper by Shuster in Stat Med where this point is completely missed)

Over the years many models have emerged that have tried to optimally address a) and b) above. The fixed effect model kicked off the model creation bid in 1977 (officially, though it may have predated this) and it certainly fulfilled a) in comparison to the arithmetic (or naturally weighted) mean but failed miserably in b) because it did not include a mechanism to address overdispersion related to systematic error. For this simple reason, DerSimonian and Laird proposed the random effects model in 1986 – an attempt to address b) above. They too failed miserably because the CI was widened using the random effect variance component they created (method of moments) but they used the latter as weights as well and so the widened CI was associated with a much larger MSE when compared to the fixed effect estimator and thus overdispersion persisted. Thus the CIs under the random effects model, though wider, are too small considering the MSE of this model and therefore remain (paradoxically) too narrow. This state of affairs was blamed (over the years) on the “inaccuracy” in computation of the random effect variance component and several alternatives to the method of moments method of DerSimonian and Laird emerged (e.g. REML, ML etc). They were all associated with the same problem however as this variance component was simply a nuisance parameter as aptly reported by Stephen Senn (Trying to be precise about vagueness). It turns out that the assumptions underpinning this nuisance parameter are not valid in practice (normally distributed random effects) and therefore there is no way a model under such an assumption can be improved. Eventually debates continued and many other models were proposed but they all fell under one of three assumption frameworks:

  1. Common parameters assumption
  2. Exchangeable parameters assumption
  3. Independent parameters assumption

In my view, the only assumption under which a meta-analysis should be considered valid is the first one (common parameters) where the meta-analysis is attempting to estimate a common underlying parameter. The exchangeable parameters assumption where the underlying ‘true effects’ across studies are not necessarily identical but are somewhat similar, the similarity being governed by parameters of a hypothetical distribution governing these ‘true effects’ (most common being a normal distribution) makes no sense for various reasons I will not go into here. It suffices to say that if underlying ‘true effects’ across studies are not necessarily identical then “error” weights should not be used in which case this ceases to be a meta-analysis.
If we accept the argument that meta-analysis should only be undertaken under the common parameters assumption, then the fixed effect model needs to be “fixed” to correct overdispersion. This has been done and the IVhet model created that provides the ideal mix of a) and b) above under the common parameters assumption. So to answer Frank’s question - neither fixed nor random - the IVhet model is what should be considered suitable for use in meta-analysis.

1 Like

The problem with any blanket statement is that dose and timing are crucial, especially for treatments of infections. Take rabies vaccination: Effectiveness at preventing mortality is very high if the series is completed immediately before a bite from a rabid animal occurs, but is about 0% by the time CNS symptoms appear.

Similarly, for HCQ, ivermectin, molnupiravir, and Paxlovid, effectiveness against adverse covid-19 outcomes appears highest (even when not high) if begun just before infection and indistinguishable from zero by the time infection is symptomatic; plus there is a dose dependency that also applies to side effects. Note that, according to the HCQ meta-analysis I cited, pre-infection administration had visible benefits, but not so for post-infection administration.

So the lead question about any trial such as the ones you were in is: what was the precise dose and timing? And then, what were the outcomes for as-assigned (intent-to-treat) and as-taken analyses, and how did they differ? Of course any attempted answers should emphasize statistical and other uncertainties. In that regard there is a need for formal statistical methodology and software to examine simultaneously and efficiently the multiple outcomes in these studies (duration of symptoms, hospitalization, etc.); everything I see analyzes all outcomes separately, which is far from optimal, especially when the outcomes are highly dependent as here.


Devil’s advocate question (from someone unqualified to discuss the pros/cons of fixed versus random effects models): To what extent do meta-analysis (MA) summary statistics (i.e., the diamond at the bottom of the forest plot and assessments of heterogeneity) influence how a meta-analysis is actually used? Specifically, are there examples of MAs that have been performed in the assessment of either drug efficacy or safety, whose “bottom line” results were deemed credible enough to influence clinical practice or regulatory decision-making to a greater extent than a large high quality component trial would have done on its own?

My impressions of the role of MA in clinical practice and drug regulation (for whatever they’re worth):

  • There is general agreement that pre-planning a MA (i.e., stating a plan to perform a MA and then designing and conducting its component trials) is the most credible approach. Unfortunately, this type of MA is rare;
  • The vast majority of MAs are not pre-planned, but rather retrospective (i.e., the decision to perform the MA was made after the component trials had already been completed);
  • There are many big (?insurmountable) inferential problems with retrospective MAs, including (but not limited to): 1) the temptation to analyze apples with oranges; and 2) the temptation to select apples and oranges that will support a pre-formed opinion;
  • MA to support pivotal efficacy claims for the purpose of drug approval doesn’t seem to be a “thing”- at least not in any widespread way (?) An old EMA guidance document: https://www.ema.europa.eu/en/documents/scientific-guideline/points-consider-application-1meta-analyses-2one-pivotal-study_en.pdf
  • MA to investigate drug safety has, historically, proven to be hugely controversial, as FDA can attest (most MAs in this space have been retrospective). Some draft guidance: https://www.fda.gov/media/117976/download

Arguably, the main value of COVID therapy MAs was to highlight the pitfalls of retrospective MA, rather than to provide actionable support for clinical decision-making. It’s widely acknowledged that a large portion of studies conducted during the early part of the pandemic were poorly done (and some were outright fraudulent). Fortunately, there are now well done RCTs that have failed to identify benefit from ivermectin or hydroxychloroquine. In my mind, these later studies render all other small, earlier studies moot. Failure of adequately-sized, well done RCTs to show intrinsic efficacy of “repurposed” drugs is not at all surprising- indeed, this is the expected outcome from such trials. In contrast, Paxlovid is efficacious when administered during the first few days of symptoms, as shown primarily in an RCT involving unvaccinated patients. It’s also being used for higher-risk vaccinated patients (e.g., those who might not mount a great response to the vaccine). For non high-risk patients who are vaccinated against currently-circulating viral strains, who nonetheless request Paxlovid, we caution that the benefit might not be meaningful.

Summaries of trials done in a particular disease area can sometimes suggest that we’ve been barking up the wrong tree in our efforts to improve outcomes. The forest plots in this 2016 review of the evidence around glycemic control in type 2 diabetes were striking:


Given all the caveats around the evidential value of retrospective MA and the fact that most published MAs are retrospective, I wonder how important MA summary statistics really are for decision-making… I assume that it’s the summary statistics that reflect choices between a “fixed” versus “random” effects model (?) My own bias is to believe that a large, well done RCT will usually be superior to any retrospective MA, in terms of its potential to influence practice. To this end, I wonder whether the forest plots in most MAs serve primarily a qualitative rather than quantitative function i.e., to help us to take stock of the volume of prior research in an area, eyeball the uncertainty in each study’s estimate, see whether studies are generally consistent or inconsistent in their findings, and then use these results to decide whether further research is needed (?)


Quite right Erin - the place of meta-analysis is pretty much established in evidence based medicine practice even though most are retrospective exercises. No one today can deny the fact that properly conducted meta-analyses are the highest level of evidence in evidence-based medicine and are therefore instrumental for guidelines and standards of care and are the basis for statements issued by scientific societies, national and international medicine agencies and the WHO. As I have stated (Angry scientist, angry analyst, angry novelist) in response to similar questions, we should now view primary research as a contribution towards the accumulation of evidence on, rather than a means towards the conclusive answer to, a scientific problem. A systematic review and meta-analysis helps to provide researchers with sufficient information to assess what contribution any new results can make to the totality of information, and thus permit reliable interpretation of the significance of new research.


I don’t want to sidetrack the conversation away from meta-analysis but wanted to give a partial answer to Sander’s questions:

In our two large ACTIV-6 ivermectin studies we addressed most of your questions. The second RCT used a high dose of ivermectin and we did a lot of analyses related to timing and vax status. The key point is that the previous RCTs were wrong to conclude that there is zero effect. They found a small positive effect similar to the magnitude we found in our large RCTs. Statistical significance doesn’t address this. Were investigators to rely on significance all they need to do is randomize 2000 patients and you’ll be able to show “significance”.

Results of the ACTIV-6 studies are presented as not supporting ivermectin efficacy, yet, as you note, some of the posterior distributions, as presented in the main article, could be interpreted as supportive of a small effect.

I sense that there might have been some disagreement re how the results of the trial should be presented…I’d really like to see the Supplementary material (as I was looking for a discussion of the Bayesian priors), but for some reason the links to the Supplements only lead to a single page, without any details of the Bayesian analysis plan (?) Am I wrong to be wondering about the prior(s) and how they were elicited (?)

The SAP has extensive details. I need to find out the official way to acquire it. There was no disagreement on how the results were to be presented, but instead what you see is a desire to present the pre-specified analysis that provides evidence for any effect plus a supplemental analysis that puts that into a clinical context.

1 Like