A few weeks ago Dan Scharfstein asked a group of colleagues about how to report an odds ratio of 1.70 with 95% confidence limits of 0.96 and 3.02. Back-calculating from these statistics gives a two-sided P of 0.06 or 0.07, corresponding to an S-value (surprisal, log base 2 of P) of about 4 bits of information against the null hypothesis of OR=1. So, not much evidence against the null from the result, but still favoring a positive association over an inverse one, and so thought worthy of reporting as such. The problem was that the journal to which this was submitted was still using the magic 0.05 alpha level as a reporting criterion.
The possibility of citing Bayesian alternatives was raised. The log odds ratio β = ln(OR) has no logical bound, so the classical (Laplacian) P-value would be the posterior probability of β ≤ 0 (OR≤1) from an improper uniform prior on β (e.g., see Student 1908), which equals half the usual two-sided P-value for β=0. That’s a bit over 0.03 in this example, so squeaks under 0.05. It’s not clear however that this result would mollify a journal stuck on a 0.05 cutoff, and might be seen as P-hacking. Then too a lot of Bayesians (e.g., Gelman) object to the uniform prior because it assigns higher prior probability to β falling outside any finite interval (−b,b) than to falling inside, no matter how large b; e.g., it appears to say that we think it more probable that OR = exp(β) > 100 or OR < 0.01 than 100>OR>0.01, which is absurd in almost every real application I’ve seen.
Almost as absurd in typical applications are the proper priors that are billed as “reference priors” or “weakly informative priors,” such as those based on rescaled t distributions with few degree of freedom. For example the upscaled t1 (Cauchy) prior for β proposed by Gelman et al. (Annals App Stat 2008, 2, 1360–1383) assigns 26% probability to OR>25 or OR<0.04. The Jeffreys prior for β fares little better, assigning 25% probability to the same event. Worse, yet often seen, are normal(0,10^n) priors with n≥2, which assign over 75% probability to OR>25 or OR<0.04. These are huge probabilities for associations so large that no one would believe them in typical “soft-science” applications: Reported associations that large are almost always numerical artefacts, and if real would have been obvious from almost any replication attempt. So these priors and other “weakly informative” or “reference” priors are nowhere near what anyone believes in typical contexts; in other words, none of these priors are credible. Why then should we take seriously a posterior probability generated using one of these contextually incredible priors? At best, it might serve as a weak bound on credible posterior probabilities derived from informative priors, but the ordinary one-sided P-value already fills that role (see Casella and Berger 1987, “Reconciling Bayesian and frequentist evidence in the 1-sided testing problem”, Journal of the American Statistical Association, 82, 106-135).
Frank Harrell suggested instead giving the posterior probability of a positive association from a “skeptical” prior, but did not specify what such a prior would look like. He wrote that “Skeptical in my mind is a prior that on the log scale [here, for β] has mean 0 and variance chosen so that the probability of a large effect (or prob. of an effect less than the reciprocal of that) is say 0.05.” That leaves wide open at least one prior parameter (the variance), and leaves open even more if one allows non-normal priors. Thus, in opting for a contextually credible Bayesian definition of “skeptical” we’ve set ourselves a specification task requiring unfamiliar choices. Furthermore, any proposed default can generate many objections, as just discussed for vague “reference” priors. No wonder then that Bayesian methods haven’t caught on.
Still, I’m going outline an approach I’ve used to operationalize “large effect” in an epidemiologic context. To start, I want a prior not too skeptical or informative, as I would not want to obscure potentially important relationships or overweight prior information from “experts” (who in my experience are often grossly overconfident about effect sizes based on rather weak previous evidence). On the other hand, I usually want to shrink down the sort of inflated associations that get highlighted by selecting on “significance” (an inflation that gets worse as the cutoff is dropped, as when multiple-comparison adjustments are used). Focusing on the epidemiology of rare (and thus hard to study) diseases like cancers, I have noticed that effects get called “large” when the OR is outside the range of about ¼ to 4. In the rare-disease case I would thus compare the one-sided P-value for OR≤1 to the posterior probability of OR≤1 under some sensible if neutral prior, e.g., “large effect unlikely” represented by using a prior on β = ln(OR) symmetric about 0 that produce a 95% prior interval for the OR=exp(β) of (¼,4). To gauge the impact of the prior, I would also compare the resulting 95% posterior interval to the 95% confidence interval.
That leaves the shape of the prior to be determined. A familiar form that yields the desired 95% prior interval of (¼,4) for OR is the normal(0,½) distribution for β (lognormal for OR). This choice also places about 2:1 prior odds on the OR interval (½,2). Nonetheless, a choice I prefer over the normal for more elegant computation and connection to prior information in logistic regression is the conjugate prior for β, the log-F distribution (F for OR). Equating the numerator and denominator degrees of freedom produces a log-F(m,m) prior for β, which like the normal is unimodal and symmetric around its mode; and the log-F(9,9) prior distribution produces a 95% prior interval for OR of (¼,4).
Since the log-F(m,m) is rather unfamiliar, as an aside I give some details: Although it has heavier tails than the normal, it approaches normality rapidly as m increases, and for m≥9 produces results negligibly different from a normal with the same 95% central interval. The shared degrees of freedom m is the number of independent Bernoulli trials with parameter π = expit(β) = exp(β)/(1+exp(β)) = OR/(1+OR) needed to encode the information about β in the prior. This relation enables the user and reader to grasp the evidential strength of the prior in terms of observing m tosses from a coin-tossing mechanism. For example, specifying a 95% prior interval for OR of (¼,4) is claiming to have prior information on OR = π/(1−π) equivalent to the information on the odds of heads vs. tails obtained from observing 4 or 5 heads in 9 independent tosses. Given the log-F convergence to normality, the same interpretation could be given to the normal(0,½) prior. For more about encoding prior information and computation with log-F priors see Greenland 2007, “Prior data for non-normal priors,” Statistics in Medicine, 26, 3578-3590.
OK, so there’s my working answer to Frank’s suggestion for a skeptical prior, at least for a “rare” disease. Applying the normal(0,½) for β to Dan’s 95% confidence interval for the OR of (0.96, 3.02) yields an approximate posterior mean and 95% interval for OR of 1.58 and (0.93, 2.68), with a posterior probability (“Bayesian P-value”) for OR≤1 of 0.046 (those numbers were computed using inverse-variance-weighting as per Ch. 18 of Modern Epidemiology 3rd ed. 2008 - and no, I did not cook them to keep the probability of OR≤1 below 0.05).
However, Dan’s example involved an outcome which may not be rare. For common (and thus more precisely estimated) outcomes like myocardial infarction “large” tends to mean outside the range of about ½ to 2. So I expect a genuinely informed skeptical prior for the OR in his application would need to be much narrower than normal(0,½), although how much narrower depends on contextual details which were not given. Finally, due in part to the counterintuitive behavior of odds ratios for common outcomes (including noncollapsibility), I would not have reported an OR as the final estimate – I might have instead reported standardized risk ratios or differences (“marginal effects”) derived from a logistic model.