Subgroup analysis and Meta-regression in Meta-analysis

Okay that is clear. . I have a philosophical position on meta-analysis of prevalence - when should it be done and when not but will bring that up last. Assuming that the meta-analysis is appropriate it should always be conducted using transformed proportions because that stabilizes the variance and keeps the 0,1 limits of the proportion as noted from your output using raw proportions
There are two transforms commonly used - FTT and logit and FTT is preferred. FTT has received a lot of attention after we first elaborated on it
Barendregt JJ, Doi SA, Lee YY, Norman RE, Vos T. Meta-analysis of prevalence. J Epidemiol Community Health. 2013 Nov 1;67(11):974-8.
There was then a critique by Schwarzer et al
Schwarzer G, Chemaitelly H, Abu-Raddad LJ, Rücker G. Seriously misleading results using inverse of Freeman-Tukey double arcsine transformation in meta-analysis of single proportions. Res Synth Methods. 2019 Sep;10(3):476-483.
and a response by us
Doi SA, Xu C. The Freeman-Tukey double arcsine transformation for the meta-analysis of proportions: Recent criticisms were seriously misleading. J Evid Based Med. 2021 Dec;14(4):259-261.
There is also a new paper out that has a criticism from a different angle
Röver C, Friede T. Double arcsine transform not appropriate for meta-analysis. Res Synth Methods. 2022 Jul 15.
Although I don’t believe the flaw they have detected meaningfully impacts the use of the transform
With historical perspective out of the way I will suggest we take the data from the Schwarzer paper as an example of use of the FTT transform in metan
First update metan by typing
ssc install metan, replace
then run the following:
input str3 studyname long n int cases byte qi
"S1" 217154 422 1
"S10" 16557 32 1
"S13" 676 1 1
"S18" 44 1 1
"S26" 29 1 1
this will add the Schwarzer data to Stata
then run the meta-analysis as follows:
metan cases n , pr model(ivhet \ re ) transform(ftukey, iv) study( study ) forestplot(astext(85) textsize(120) boxscale(55) spacing(1.2) leftjustify range(0.1 7) dp(3)) denom(1000) extraline(yes) hetinfo(isq h)

This will put everything in perspective and you can ask questions after you review this

These authors recommend a logit transform over arcsine. I think they make a good case.

They refer to regression models and in this situation I agree. For meta-analysis the variance stabilization is better with the FTT as compared with the logit transform.

Also, FTT needs a well considered back transform and that is where Schwarzer et al slipped up - they used the harmonic mean which we had flagged in 2013 as not recommended. Metan in Stata has our back transform as an option and is in the code I posted above

Why don’t those considerations also apply to meta–regression? The simulation they present suggest otherwise.

From the discussion

In this paper, we demonstrated using theory, examples, and simulations that logistic regression and its random-effects counterpart have advantages over analysis of arcsine-transformed data in power and interpretability. For binomial data, power tended to be higher when using a logistic regression approach than arcsine-transformed linear models. In addition, the logit function has a much simpler interpretation, while avoiding the possibility of nonsensical predicted values. For non-binomial proportions, there was never any theoretical reason to use the arcsine transform in the first place, and we instead suggest using the logit transform. It is important to recognize that these ideas apply equally well to proportions collected in ANOVA designs as to those collected in a regression context.

They are models of individual participant data - my guess is performance metrics of interest differ in this case.
Coming back to meta-analysis - we are interested in error (lower MSE) and error estimation (coverage that is at the nominal level) and given that variance stabilization is better with the FTT transform this has to perform better. The problem with this transform is with the back-transform and that has led to many issues being reported for the FTT

There are several known issues with the logit transform:
a) the variance of the logit transformed proportion depends additionally on the event counts
b) within-study variances are treated as fixed and known values in MA, while the event counts are not thus violating one of the assumptions in MA
c) Since both the logit proportion and its variance depends on the event counts they are correlated ending up in gross bias in MA especially of studies with small samples

Thus clearly stabilization of variances is expected better with FTT without need to cite any studies

Suffice it to say that while I have no dog in this fight, there is a growing literature that disagrees with you.

As for the idea that the logistic can be sufficient at the individual study level, but not at the meta-analytic level strikes me as self-refuting. If an individual is attempting to compress his/her knowledge via a regression on studies, and accepts the logit as valid at the individual level, combining the information in sufficient stats via logistic modelling directly follows, since logits can be combined by addition. How to model the heterogeneity is another matter.

After a bit more reading, I’ll come back and attempt a more formal proof of this.

okay lets see what you find - perhaps limit your proofs to a standard pairwise aggregate data MA as most meta-regression simply is a weighted linear regression with an ES as the dependent and I do not see much point in debating that.

What has been debated a lot is the best transform in the standard MA and so far no one has raised a convincing argument against FTT though many have tried and continue to try as noted in the citations I posted

Coverage probability for n = 50 from:
Interval Estimation for a Binomial Proportion. Lawrence D. Brown, T. Tony Cai and Anirban DasGupta. Statistical Science, Vol. 16, No. 2 (May, 2001), pp. 101-117

The oscillation in coverage for very small proportions can be fixed (see our paper) but the logit interval, for the same coverage is much larger.

1 Like

Dear Prof. @s_doi and @R_cubed,
Thank you both for sharing your thoughts and relevant literature. @s_doi I read your articles earlier, and that was the basis for changing from Logit to FTT. It was truly helpful.
I am really enjoying the debate. I believe such healthy debates could empower early career researchers to make informed decisions on what transformation should be used and why. Great stuff!

I think it is me. I executed the commands you gave, but unfortunately, Stata said “cases’ cannot be read as a number.” Any command I am executing Stata is displaying the same message “xyz’ cannot be read as a number.” What could be the problem? However, I was able to run a transformed command using the following and produce a Forest plot.

. metan cases Samplesize, pr model(ivhet \ re ) transform(ftukey, iv)

I can see all my CI are within 0 and 1. How can I do a subgroup analysis maintaining these CI between 0 and 1? And would the same principle be applicable when performing meta-regression? For meta-regression, I believe we need to use the ‘Regress’ command. Can we do a subgroup analysis in Metan?
On another note, does ‘ivhet’ stands for heteroscedasticity of the instrumental variable?
Plenty of things are new for me. Therefore, I have lots of questions for experts.
I am thankful for all your valuable comments @R_cubed, @s_doi.

Kind regards,


The command you used is fine as the rest affect the display of the forest plot. Note study(study) was meant for you to put the variable that holds the study name in your dataset in parenthesis instead of “study”. range(0.1 7) is the range of the proportions across the forest plot and given denom(1000) is expressed as cases per 1000 population - not tweaking these could lead to an error. Adding by(subgroup) will give you the subgroup analysis where “subgroup” is the variable that holds your subgroup indicator variable.

transform(ftukey, iv) implements the Miller back-transform with the Barendregt-Doi modification. If you use transform(ftukey) i.e. just ftukey alone then you get the results Schwarzer was complaining about and this then also gives you the same result as metaprop - in 2013 we had warned against using this but this remained largely ignored and when Schwarzer wrote the paper they did cite our paper but ignored the change we had suggested. This change was implemented in Stata after our rebuttal was published.

IVhet is a fixed effect model replacement for the RE model that removes the overdispersion seen with the RE model. Although it is a fixed effect model, it can be used for heterogeneous data. It is just an alternate model that I recommend everyone use in lieu of the RE model whose assumptions seem to me to be questionable at best when used in MA.

There is a metareg command in Stata but I will not recommend it as it only allows RE weights. To use IVhet weights you simply run
regress FTT x1 x2 [aw=1/v], vce(robust)
where v is the variance of each FTT transformed proportion from each study and x1 and x2 are moderator variables

Without overwhelming the OP, I’ll try to briefly sketch out the disagreement I have
with Doi’s representation of the literature on this issue.

Starting from first principles – I take Herman Chernoff’s philosophy as a worthwhile

With the help of theory, I have developed insights and intuitions that prevent me from
giving weight to data dredging and other forms of statistical heresy. This feeling of freedom
and ease does not exist until I have a decision theoretic, Bayesian view of the problem
I am a Bayesian decision theorist in spite of my use of Fisherian tools.

For Bayesians, use of decision theory as a formal tool even applies to the design of experiments.
I view meta-analysis as a tool to derive the most informative experiment, given goals and
resource constraints.

With this outlook, I find classical meta-analytic methods excessively reliant on the
metaphor of a “population” of studies, and the assumption of normality. [1][2]

Gene Glass, a pioneer in meta-analysis wrote: [1]

Third, the conception of our work that held that “studies” are the basic, fundamental unit of a research program may be the single most counterproductive influence of all. This idea that we design a “study,” and that a study culminates in the test of a hypothesis and that a hypothesis comes from a theory this idea has done more to retard progress in educational research than any other single notion.

Summary of Criticisms

  1. In his dismissal of the logit method, he failed to distinguish between the classic 2 step proportion combination procedures, and the more recently proposed 1 step GLMMs (Generalized Linear Mixed Models) [3-5]. This is directly relevant as the OP mentioned meta-regression, with [4] providing a good example on how to proceed.

  2. Limiting the discussion to classic 2 step methods, his advocacy of the Freeman-Tukey double arcsine variance stabilization transformation is inadequate. In order for the synthesis to be useful, the estimate must be converted back from the combination scale to the [0-1] interval. This is trivial for the closest competitor – the arcsine transform – which is never mentioned in his papers, but is discussed in [3-5] and recommended by the authors in [6].

The Freeman-Tukey transformation converges to the arcsine in large samples, but is not defined in a meta-analytic context with multiple proportions, as the authors mentioned in [7] (this paper was also noted above). Considering Glass’s quote above, the fact this transformation is so reliant on how sample sizes are averaged leads me to skepticism of its value in this context.

I’d agree variance stabilization can be valuable, but the double arcsine is too complicated without clear
benefit over the arcsine.


  1. Glass, Gene Meta-analysis at 25. Self-published Jan 2000 Archived at:

  2. Jackson, D, White, IR. When should meta-analysis avoid making hidden normality assumptions?
    Biometrical Journal. 2018; 60: 1040 1058.

  3. Lin, L, Xu, C. Arcsine-based transformations for meta-analysis of proportions: Pros, cons, and alternatives. Health Sci Rep. 2020; 9999:e178.

  4. P. J. Shi, H. S. Sand Hu, H. J. Xiao “Logistic Regression is a better Method of Analysis Than Linear Regressionof Arcsine Square Root Transformed Proportional Diapause Data of Pieris melete (Lepidoptera: Pieridae),” Florida Entomologist, 96(3), 1183-1185, (1 September 2013)
    Logistic Regression is a better Method of Analysis Than Linear Regression of Arcsine Square Root Transformed Proportional Diapause Data of Pieris melete (Lepidoptera: Pieridae)

  5. Lin L, Chu H. Meta-analysis of Proportions Using Generalized Linear Mixed Models.
    pidemiology. 2020 Sep;31(5):713-717. doi: 10.1097/EDE.0000000000001232. PMID: 32657954; PMCID: PMC7398826.

  6. Kulinskaya, E., Morgenthaler S., Stadute R. Meta Analysis: A Guide to Calibrating and Combining Statistical Evidence. Wiley 2008

  7. Röver C, Friede T. Double arcsine transform not appropriate for meta-analysis. Res Synth Methods. 2022 Jul 15. [2203.04773] Double arcsine transform not appropriate for meta-analysis


The only comment I will make is regarding my use of the FTT as the rest is your opinion which of course you are entitled to hold.

You are right, I have never really given much weight to the usual arcsine-square-root transformation as Freeman and Tukey created a much better variance stabilizing version by summing over the two arcsine values. Also Lin and Xu that you quote above are coauthors of mine and Xu co-authored the rebuttal ( Doi SA, Xu C. The Freeman-Tukey double arcsine transformation for the meta-analysis of proportions: Recent criticisms were seriously misleading. J Evid Based Med. 2021 Dec;14(4):259-261.)

There is no objective evidence to date (would be happy to see it if you have some) that the GLMM outperforms the standard RE aggregate data approach in MA if the right simulation approach is used. What has been done by most (including Chu that you quote above (and who is also a co-author of mine) is to simulate the way that the data will be analysed i.e assume random effects in data generation and then analyse using a random effects assumption - this is nothing more than a self fulfilling prophesy that I have criticized previously (Doi SAR. Examining how meta-analytic methods perform. Res Synth Methods. 2022 May;13(3):292-293.)

1 Like

The complaint of Rover and Friede is that the conversion of the double arcsine combined estimate back to the proportion scale can lead to an estimate that is outside the actual range of the data when the sample sizes are drastically different. That is a serious criticism that the single arcsine (which is the limiting value of the double) does not have.

The point of transforming proportions to a different scale, and then back to a proportion is to have a common scale for combination. Variance stabilization is an important feature, but there is no common scale, only overlapping ones with multiple sample sizes for the Freeman-Tukey… .

As for GLMM, this is from the abstract of Lin and Chu’s Meta Analysis of Proportion Using GLMMs

In general, GLMMs led to smaller biases and mean squared errors, and higher coverage probabilities than two-step methods. Many software programs are readily available to implement these methods.

1 Like

Rover and Friede are correct but the error is at the extremes and is so small that it lacks practical significance - there is always a trade-off in methods and a superficial read of the paper indeed sounds alarming if the context is ignored. Its the classic cycle of EBM that we teach - there are studies that suggest coffee causes cancer and also that coffee protects against cancer - an in depth understanding is required to make a recommendation and simply quoting from these authors does not resolve the issue as its much more complicated usually than a need to take sides.

The variance reduction vs the single arcsine is going to be very small in the vast majority of cases, compared to other error sources. This is a case of taking normal theory (which is only an approximation) too literally. The chance of a combined estimate that is not even in the range of the data requires the analyst to check if the implied sample size used by the Freeman–Tukey method is sensible.

On simple mini-max grounds, the FTT fails to dominate the arcsine as there exist cases where the latter is better. If errors are weighted, a small chance of a large, embarrassing error with the FTT doesn’t seem worth the effort to.use it, considering the notion of a single “sample size” for multiple proportions has no theoretical justification.

Rover and Friede produce an example where the FTT reverses the order of two data points on the combination scale. This reminds me of the problem I noted on the use of parametric models on ordinal data:

The early critics of parametric models on ordinal data noted that arbitrary scale transformations could change the observed sign of the effect…
The implication is that no information is communicated by parametric models on ordinal data.

I don’t think this method is as bad as that, but I see no reason to use something that has a small chance of changing the ordering of the data points, which destroys information.

1 Like

@Junaid I had said I will address the philosophical question last. Basically meta-analysis is a weighted average and there are many types of weights one could use for weighted averaging. Only error (variance) weights make a weighted average a meta-analysis. However, in this situation there must be an underpinning unknown common population parameter otherwise variance weights are inappropriate. In most burden of disease meta-analyses this underpinning unknown common population parameter is absent and thus meta-analysis is not appropriate for such studies. See An Updated Method for Risk Adjustment in Outcomes Research

Dear @s_doi and @R_cubed, I hope you both are well. I apologise, as I travelled abroad and couldn’t respond to your comments. I am working on subgroup analysis and metaregression and will get back to you for your feedback. Thank you for bearing with me.

Dear Prof. Doi,

I used the “regress FTT x1 x2 [aw=1/v], vce(robust)” command and the Stata said, “option ftt not allowed.” Any suggestions, please?

The command

regress year samplesize, ftt vce(robust)

is not correct and should be

regress _ES year samplesize [aw=1/(_seES^2)], vce(robust)

Both _ES and _seES are created after you run metan and are on the FTT scale if you selected the FTT transform in metan

1 Like