Analogy between clinical trials and diagnostic tests: using likelihood ratios to interpret negative trials

R_cubed · June 21, 2023, 11:33am

@Robert_Matthews has already published a better method of doing this from the Bayesian perspective using the reported confidence intervals. Matthew’s Analysis of credibility procedure can give a wide range of plausible values for an effect that could justify continued research into a topic, in spite of a failure to achieve “significance” in a set of studies.

From the original link:

Among 169 statistically nonsignificant primary outcome results of randomized trials published in 2021, the hypotheses of lack of effect (null hypothesis) and of clinically meaningful effectiveness (alternate hypothesis) were compared using a likelihood ratio to quantify the strength of support the observed trial findings provide for one hypothesis vs the other; about half (52.1%) yielded a likelihood ratio of more than 100 for the null hypothesis of lack of effect vs the alternate.

The premise of the article is attempting to defend a logical fallacy with empirical data.
They just picked 2 arbitrary points on a likelihood function (each that have precisely 0 probability of being true from a strict axiomatic perspective) to defend the idea that nonsignificant studies provide evidence in favor of the null. I can’t access the paper because of the paywall, but the use of the term “likelihood ratio” vs. “likelihood function” makes me suspicious this is what was done.

It is flawed because of:

Coming to conclusive results based on single studies directly contradicts the idea that meta-analysis can magnify power and precision of any test or estimate. Any experiment is a simultaneous test of the experimental procedure in addition to the scientific question. Neyman denied the possibility of drawing firm conclusions in individual cases, and interpreted outputs as decisions for behavior only. AFAIK, even Fisher insisted on replication of experiments that yielded positive tests of significance.
Their meta-analysis is flawed for all of the reasons @Sander alluded to in that giga-thread on odds ratios. Using empirical data to justify a statistical procedure is at best, exploratory.

Key quote:

Without some extraordinary considerations, we do not transport estimates for the effect of antihistamines on hives to project effects of chemotherapies on cancers, nor do we combine these disparate effects in meta-analyses.

LIkewise, we don’t transport likelihoods from heterogeneous studies either.

Related Reading
https://www.sciencedirect.com/science/article/abs/pii/S1047279712000221

The discussion and some of the links are worth follow up.

Here is an interesting discussion on Andrew Gelman’s blog on 4 interpretations of p-values that is related to this paper. @Sander_Greenland has a number of valuable comments in this thread as well.