Meta-analysis of proportions versus simple pooling of data

Dear Forum users,

I am planning to perform a systematic search in the literature to assess the percentage of pharyngeal cancers that are positive on a certain virus (human papilloma virus, HPV) among all pharyngeal cancers. The aim is to be able to present a result such as “32% (95% CI …) of all pharyngeal cancers are associated with HPV”.

I presented data from an initial search to my supervisor. To obtain an estimate, I had simply pooled the data from 9 studies, adding up all the reported numbers of cancers as denominator and the reported number of HPV+ cancers as numerator.

My supervisor said this approach gives a biased result and said I need to do a meta-analysis. However, he was not able to explain really well why my approach was flawed. He just said that the meta-analysis would consider the size of the study and give more weight to larger studies. However, my approach basically does the same as larger studies contribute a higher number of cancers to my estimate.

I did not find a clear answer to my question in literature. I found that meta-analysis weights the study by the inverse of the squared standard error, which in turn is a function of sample size and variance. So, I understand why a meta-analysis gives more weight to larger studies (=smaller SE, higher weight). But the variance of a proportion (np(1-p)) close to 0 (or to 1, but this does not apply to my data) is smaller than the variance of a proportion close to 0.5. So, if my understanding is correct, studies in which there was a very low proportion of HPV+ tumors have more weight than equally large studies with a higher proportion. I am concerned that this would bias (rather than prevent bias) the results towards a lower estimated proportion.

I greatly appreciate your opinion on whether (and why) my approach is flawed, and whether a meta-analysis is required.

Thank you and best regards,

i’ve heard that when events are low the mantel haenszel is maybe better than inverse variance method:

incidentally, one reason you might not just add up the counts is because you want to assess heterogeneity of the estimates

Thank you! It seems that my software (Stata) cannot do the mantel haenszel method for meta-analysis of proportions (i.e., when there is no comparison of two treatment groups with an effect size like OR or RR, but when you just want to meta-analyse a proportion)

I agree that an advantage of the meta-analysis approach is the quantification of heterogeneity. But I still do not understand why my approach would not work, and I am still concerned that giving more weight to studies with lower proportions would bias my results.

Any thoughts on that?


is there much spread in the estimates ie estimates near 0 and also estimates near 0.5, and this leads you to ask the Q? the n’s sound small in that case, and the studies varied. If there are studies with low proportions then they are genuine and don’t ‘bias’ the results, especially if they are from large n. A forest plot would certainly be useful and might make you more content with the differential contribution of individual estimates

Thanks again! Yes, there is quite some spread and heterogeneity. I agree that studies with low proportions are genuine, but I think they get over-proportional weight (I mean more weight than they should have based on their sample size, because the lower proportion also leads to a lower standard error). Simply adding up the numbers from all studies would give each study exactly the weight it deserves based on the sample size it contributes, so I cannot readily see why my initial approach of pooling the data is wrong.

I agree that a “real” meta-analysis has advantages (forest plot, measures of heterogeneity etc), but I am simply interested in the proportion of tumors that are HPV+. My simple approach of pooling of results and meta-analysis give similar but yet different results (and the results of the meta-analysis depend on the many choices you can make in the software), such that it is unclear which is actually the “best” estimate.

So, why would it be wrong to simply add up all the numbers and calculate the proportion of HPV+ tumors?

Any further input is highly appreciated as I do not simply want to do a meta-analysis because it is “better”, in particular, I also would like to understand why this is better.

Best regards,

You might want to take a look at the link I posted in this thread, which compares and contrasts “pooling” ie. treating all of the reports as a single large trial, and “combining” – ie. averaging in proportion to sample size.

Pooling studies by treating the data of the individual articles as if it came from one large trial, is not valid in the general case due to Simpson’s paradox. The paper in the link is open access and goes into the mathematical details.

You might want to check out my thread on meta-analysis, because there are a lot of techniques recommended in the secondary sources that do not stand up to mathematical scrutiny.

I really should organize these better, but there are a lot of useful papers here.

1 Like

Thank you very much, this is very helpful!

1 Like

As a Stata user you may find the “metaprop” package to be useful for performing a meta-analysis of proportions (see The Freeman-Tukey double arcsine transformation option to stabilize the variance is useful when there are studies with proportions close to 0 or 1.

1 Like

Thanks, ess! This is very useful.

Do you think the Freeman-Tukey double arcsine transformation is appropriate? I came across the following paper which suggests otherwise:

Schwarzer G, Chemaitelly H, Abu-Raddad LJ, Rücker G. Seriously misleading
results using inverse of Freeman-Tukey double arcsine transformation in
meta-analysis of single proportions. Res Synth Methods. 2019 Sep;10(3):476-483.


Our rebuttal to the authors of the criticism is here: The Freeman–Tukey double arcsine transformation for the meta-analysis of proportions: Recent criticisms were seriously misleading