Journal Article: Which is more generalizable, powerful and interpretable in meta-analyses, mean difference or standardized mean difference?

I thought this would be rather interesting, considering some of the things I’ve learned over the past year or so doing research on meta-analysis.

It took awhile, but much of what I believed (ie. meta-analysis is a relatively straightforward way to synthesize results) is simply not so. Here are a number of methodology articles I’ve found helpful, particularly the papers by Stephen Senn later in the thread.

With that background, I post the following link on the use of mean difference vs. standardized mean difference (SMD) in meta-analysis.

Here is the conclusion:



The SMD was more generalizable than the MD. The MD had a greater statistical power than the SMD but did not result in material differences.

Suffice it to say, based upon what I’ve learned here, this is rather debatable. Critics of the method (which would included @Sander and @f2harrell) have solid mathematical reasons that should receive much broader discussion.

The criticism (as I understand it) is that it confounds an experimental design feature (controllable variation) with a population feature (population variance). The inherent heterogeneity (in small, finite samples) leads to many complications in interpretation.

This paper takes an approach that is common in the methodology literature, but seems wrong. It attempts to justify a procedure (that can be subject to mathematical analysis) by appeal to results in a particular data set. “It worked here. Therefore it is not objectionable.” This seems like misplaced empiricism.

Attempting to prove a universally quantified statement by examination of individual cases is not a generally valid proof technique.

I guess my question is:

  1. Is it ever reasonable to use the SMD for anything?

  2. Is it at best an asymptotic result that is used without considering finite sample properties?

  3. The frequentist effect size combination methods generally require larger samples – around 20 IIRC. What good is meta-analysis as a technique if I need 20 or more well powered studies to conclude anything? (This applies more towards frequentist meta-analysis than Bayesian).

  4. The recommendation (do meta-analysis on the natural units if possible) leaves open the question on what to do when the outcome is some score on a test to evaluate a hypothetical construct (QoL, pain, functional independence, etc.). It would be nice if Frank’s recommendations on ordinal outcomes had wider acceptance; I’m beginning to have great doubt that MA’s of N studies using some PRO measure have no more information than if we simply studied N individuals directly (if we are using frequentist meta-analysis techniques).

Thoughts? Am I being too pessimistic?

Would you clarify which standardized mean difference is being used? is this just a z-score from one study, derived from the study’s main treatment comparison?

It is essentially a Z statistic – mean difference divided by pooled standard deviation.

From their paper:

We defined SMD as Hedges’ adjusted g to remove any upward bias that might have arisen because of small sample sizes.

For those who want to see the formula:

Such pooling based on standard deviations (and not standard errors of the mean) means that we are going to interpret treatment effects differently when the patient inclusion/exclusion criteria differ in such a way as to alter the standard deviation of the raw measurements. I can’t see why that is sensible. (And pooling that uses standard errors has a differently and probably worse problem)


While I ultimately agree with you, the strongest advocate of the of standardized mean effect makes the following argument:

It has been shown here that a meta-analysis of odds ratios is equivalent to a meta-analysis of
effect size when there is an underlying continuous distribution, albeit with some loss of power.
This also lends some justification to the combination of odds ratios from studies with different
outcome variables, or from studies using different cut-off points of a continuous distribution.
If combining effect size is justified, then meta-analysis of odds ratios is also warranted. From
the viewpoint of Greenland’s criticism of effect size this can be reversed; if a meta-analysis
of effect size is rejected then so should one of odds ratios even if the exposure variables are
all on the same scale.

Chinn, S. (2000), A simple method for converting an odds ratio to effect size for use in meta‐analysis. Statist. Med., 19: 3127-3131. link

Earlier in the paper, she argues the log odds is simply a scale transformation of the standardized effect. Given a SMD, we can obtain a log odds (on logistic scale) by multiplying by 1.81, which is an estimate of the ratio (\frac{\pi}{\sqrt{3}}).