I thought this would be rather interesting, considering some of the things I’ve learned over the past year or so doing research on metaanalysis.
It took awhile, but much of what I believed (ie. metaanalysis is a relatively straightforward way to synthesize results) is simply not so. Here are a number of methodology articles I’ve found helpful, particularly the papers by Stephen Senn later in the thread.
With that background, I post the following link on the use of mean difference vs. standardized mean difference (SMD) in metaanalysis.
Here is the conclusion:
Blockquote
Conclusions
The SMD was more generalizable than the MD. The MD had a greater statistical power than the SMD but did not result in material differences.
Suffice it to say, based upon what I’ve learned here, this is rather debatable. Critics of the method (which would included @Sander and @f2harrell) have solid mathematical reasons that should receive much broader discussion.
The criticism (as I understand it) is that it confounds an experimental design feature (controllable variation) with a population feature (population variance). The inherent heterogeneity (in small, finite samples) leads to many complications in interpretation.
This paper takes an approach that is common in the methodology literature, but seems wrong. It attempts to justify a procedure (that can be subject to mathematical analysis) by appeal to results in a particular data set. “It worked here. Therefore it is not objectionable.” This seems like misplaced empiricism.
Attempting to prove a universally quantified statement by examination of individual cases is not a generally valid proof technique.
I guess my question is:

Is it ever reasonable to use the SMD for anything?

Is it at best an asymptotic result that is used without considering finite sample properties?

The frequentist effect size combination methods generally require larger samples – around 20 IIRC. What good is metaanalysis as a technique if I need 20 or more well powered studies to conclude anything? (This applies more towards frequentist metaanalysis than Bayesian).

The recommendation (do metaanalysis on the natural units if possible) leaves open the question on what to do when the outcome is some score on a test to evaluate a hypothetical construct (QoL, pain, functional independence, etc.). It would be nice if Frank’s recommendations on ordinal outcomes had wider acceptance; I’m beginning to have great doubt that MA’s of N studies using some PRO measure have no more information than if we simply studied N individuals directly (if we are using frequentist metaanalysis techniques).
Thoughts? Am I being too pessimistic?