Nonparametric Effect Size estimation, Likelihood Methods, and Meta Analysis


Posted as a new topic at Prof. Harrell’s request.

Study heterogeneity is a big issue in research synthesis, and is related to various issues in robust statistics.

I have an intuition that those involved in research synthesis might be better served by using nonparametric estimates of effect (Hodges-Lehmann estimates/generalized odds ratios, etc), instead of necessarily relying upon standardized mean differences reported by primary researchers.

But I am also persuaded by the work of Richard Royall and Jeffrey Blume that the likelihood perspective is the proper way to evaluate and synthesize statistical evidence.

Can anyone refer me to any recent research that discusses either robust likelihood methods, or nonparametric likelihood methods in the context of meta analysis? I am aware of the work on Empirical Likelihood by Art Owen, and some work on generalized likelihood by Fan and Jiang, but it is a bit over my head at the moment, and not directly related to the issues of research synthesis.

The impression I get is that likelihood methods can be computationally challenging, which is why things like p values and NHST persist.

Some references:
Generalized Likelihood Ratio Statistics and Wilks Phenomenon

1 Like


Just running out but this is an interesting argument I’d like to see hashed out more. Are you just referring to analysis of ordinal data here? It’s worth noting that if ordinal data are reported fully, you can just do an ordinal meta-analysis (e.g. for skin outcomes in Psoriasis).

Why is it the proper way?



“Just running out but this is an interesting argument I’d like to see hashed out more. Are you just referring to analysis of ordinal data here? It’s worth noting that if ordinal data are reported fully, you can just do an ordinal meta-analysis (e.g. for skin outcomes in Psoriasis).”

No, I am thinking about how to integrate various studies, but relaxing the parametric assumptions virtually everyone reports.

I am not sure if you have read any of the work done in robust statistics, but the initial work was done as mixture distributions – ie 90% prob of observation from normal, 10% prob from normal with large variance. I’m thinking the scenario a meta analyst faces is similar to a normal + random error distribution. How would you extract information from that scenario? A nonparametric technique should be able to handle it.

The use of parametric inference might be debatable in many cases, but that does not mean the information contained in the summaries is useless. Means and standard deviations are sufficient statistics (for the normal distribution), so I can (in principle) recover the information contained in the reported sample for re-analysis using a different (hopefully more robust in aggregate) effect measure.

The Generalized Odds ratio is a distribution free measure of effect related to the Wilcoxon test, and the Hodges-Lehmann estimator (the pseudo-median). It simply reports which group had higher scores, as an odds ratio.

With care and caution, I should be able to aggregate these effect ratios to provide an estimate of the odds of improvement that requires very few assumptions to interpret.

A gap in the recommendations for applied researchers exists. For example, if one author reports conventional parametric procedures, but another author used nonparametric techniques, there is no easy way to synthesize them into a plausible effect. You might argue (as is done in the link I cited) that the papers using nonparametrics are more credible, but omitted.

There is a short book chapter floating around the internet that discusses how to convert among various effect sizes, but that chapter does not go into detail on the distributional assumptions involved in their meaning. I do not like simply converting a Wilcoxon-Mann-Whitney (WMW) odds to a standardized mean difference.

On the contrary, I would do the opposite, and convert the standardized mean to the Generalized Odds/WMW Odds ratio. It is a broader effect measure related to the standardized mean difference, but carries fewer assumptions on the distribution.

As to the proper way to aggregate evidence, the Likelihood viewpoint has me convinced. I recommend reading anything Jeffrey Blume has written to get started.

Richard Royall is also essential reading. For a Bayesian POV, Michael Evans has some cutting edge work in the realm of relative belief.

The challenge will be to extend these parametric likelihood techniques to the nonparametric realm. Some work has been done within the past 20 years in that regard in terms of generalized likelihood, but to be honest, it is beyond my current level of “mathematical maturity”. My intuition is that there will be a lot of overlap between nonparametric likelihood and Bayesian nonparametrics.

I should thank Tim for mentioning the term “ordinal meta analysis.” Certain things were not appearing in searches with various combinations of “ordinal” + “meta analysis”.

It wasn’t a term that had occurred to me to search for, and turned up a lot of prior work (some relatively recent) done in CVA outcome literature. I am more familiar/comfortable with the term “nonparametric” or “distribution free”, because I see the benefits of nonparametric tests and estimates.

“Ordinal” statistics seems to have the connotation of “less power” that I think is mistaken.

There are some gaps in my knowledge that I can now slowly fill.

Added 4/2/2019:

Looking through the Cochrane Handbook, I found a useful citation that confirmed my intuition regarding odds ratios.

As a difference in NED is the effect size [1], a meta-analysis of ln(odds) is equivalent, albeit with loss of power, to one of eect size except for the scaling factor of 1.81.

4/9/2019: Some updated links

  1. Christian Robert (Author of Bayesian Choice) reviews Michael Evan’s Statistical Evidence and Relative Belief

  2. Michael Evans responds.

  3. Detailed paper on the formal statistical hypotheses modelled by T and Wilcoxon. Also refutes the notion that transforming data into ranks loses power. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules.

4/15/2019: Added links:

4/17/2019 – Nice technical report from Agency for Healthcare Research and Quality on Meta-Regression from 2004 (entire report is available online – 70 pages).



Michael Evans takes as gospel that every aspect of analysis must be data-checked. This is completely at odds with the known excellent properties of Bayesian borrowing of information and of prior specification that “kicks off” the analysis in reasonable fashion. For example, Andrew Gelman has written at length about the need for informative priors that help prevent dumb mistakes by just fitting the data.

1 Like


That is an interesting observation; I have not made my way through his entire book on Relative Belief, but there are aspects from my scanning of it, that seemed appealing.

I admit I find the Bayesian philosophy very attractive. But my attitude in terms of meta-analysis currently is to avoid the whole problem of prior specification and look at things from a Likelihood point of view – at least when doing work in research synthesis. Should a prior be needed, I would “cheat” and go the Empirical Bayes route. This would be the easiest improvement I could likely “sell” to colleagues (ie be able to effectively communicate to them).

I work in rehabilitation; mathematical sophistication is not very high (myself included, but I am improving). Many studies are small. Many improperly use parametric methods on inherently ordinal data. Many commit the errors you describe in BBR. I am trying to figure out how much info can be extracted from the reports without being nihilistic and saying “nothing”.

I want constructive (in the mathematical sense) measures of quality that can be defended when including, excluding, or discounting the reports of various studies. That is why I am a big fan of Jeff Blume’s work which I found through your blog a few months ago.

I fear producing naive Bayesians will be worse than naive frequentists. I am trying to figure out how to balance the need for rigor while minimizing the need for calculus and analysis to use these modern techniques. (no pun intended).

1 Like


I would just add that naive Bayesians are easily found out. It is much easier to hide frequentist assumptions, especially related to the sample space and to the real unknown accuracy of p-values and confidence limits.



I definitely agree with the problems with p values and testing. I do think the theoretical work done in the testing area can still be useful when translating the problem into an estimate.

I cannot recall where exactly I read this (perhaps in Fixation of Belief), but in Charles Sanders Peirce’s pragmatist epistomology, both skepticism and belief need justification. In spirit, this consistent with the mathematically constructive point of view, where we do not argue about “truth” but about proof.

(Constructivists do not use the technique of proof by contradiction; constructive proofs have interpretations as algorithms.)

This constructive attitude is what makes Michael Evan’s work somewhat attractive to me, but have yet to study in depth.

What do you think about the use of likelihood methods to nudge people towards a more Bayesian POV?

In a research synthesis from a decision theoretic POV, the output would be to aid a decision maker on whether a policy can be justified using prior research, or to collect more data. If so, what is the justified skeptical prior constrained by the data (appropriately weighted and adjusted for constructive measures of quality which come from likelihood theory). From a Bayesian decision theory view, then the appropriate study design can be formally derived.

Ideally, data could be used to assist in both group policy making as well as individual decision making (aka prediction), with necessary modifications.

Unfortunately, this seems to lead me into empirical bayesian nonparametrics, which is anything but simple mathematically, and I hesitate to even try to introduce this to most clinician/consumers of the research literature.