A few weeks ago Dan Scharfstein asked a group of colleagues about how to report an odds ratio of 1.70 with 95% confidence limits of 0.96 and 3.02. Back-calculating from these statistics gives a two-sided P of 0.06 or 0.07, corresponding to an S-value (surprisal, log base 2 of P) of about 4 bits of information against the null hypothesis of OR=1. So, not much evidence against the null from the result, but still favoring a positive association over an inverse one, and so thought worthy of reporting as such. The problem was that the journal to which this was submitted was still using the magic 0.05 alpha level as a reporting criterion.
The possibility of citing Bayesian alternatives was raised. The log odds ratio Ī² = ln(OR) has no logical bound, so the classical (Laplacian) P-value would be the posterior probability of Ī² ā¤ 0 (ORā¤1) from an improper uniform prior on Ī² (e.g., see Student 1908), which equals half the usual two-sided P-value for Ī²=0. Thatās a bit over 0.03 in this example, so squeaks under 0.05. Itās not clear however that this result would mollify a journal stuck on a 0.05 cutoff, and might be seen as P-hacking. Then too a lot of Bayesians (e.g., Gelman) object to the uniform prior because it assigns higher prior probability to Ī² falling outside any finite interval (āb,b) than to falling inside, no matter how large b; e.g., it appears to say that we think it more probable that OR = exp(Ī²) > 100 or OR < 0.01 than 100>OR>0.01, which is absurd in almost every real application Iāve seen.
Almost as absurd in typical applications are the proper priors that are billed as āreference priorsā or āweakly informative priors,ā such as those based on rescaled t distributions with few degree of freedom. For example the upscaled t1 (Cauchy) prior for Ī² proposed by Gelman et al. (Annals App Stat 2008, 2, 1360ā1383) assigns 26% probability to OR>25 or OR<0.04. The Jeffreys prior for Ī² fares little better, assigning 25% probability to the same event. Worse, yet often seen, are normal(0,10^n) priors with nā„2, which assign over 75% probability to OR>25 or OR<0.04. These are huge probabilities for associations so large that no one would believe them in typical āsoft-scienceā applications: Reported associations that large are almost always numerical artefacts, and if real would have been obvious from almost any replication attempt. So these priors and other āweakly informativeā or āreferenceā priors are nowhere near what anyone believes in typical contexts; in other words, none of these priors are credible. Why then should we take seriously a posterior probability generated using one of these contextually incredible priors? At best, it might serve as a weak bound on credible posterior probabilities derived from informative priors, but the ordinary one-sided P-value already fills that role (see Casella and Berger 1987, āReconciling Bayesian and frequentist evidence in the 1-sided testing problemā, Journal of the American Statistical Association, 82, 106-135).
Frank Harrell suggested instead giving the posterior probability of a positive association from a āskepticalā prior, but did not specify what such a prior would look like. He wrote that āSkeptical in my mind is a prior that on the log scale [here, for Ī²] has mean 0 and variance chosen so that the probability of a large effect (or prob. of an effect less than the reciprocal of that) is say 0.05.ā That leaves wide open at least one prior parameter (the variance), and leaves open even more if one allows non-normal priors. Thus, in opting for a contextually credible Bayesian definition of āskepticalā weāve set ourselves a specification task requiring unfamiliar choices. Furthermore, any proposed default can generate many objections, as just discussed for vague āreferenceā priors. No wonder then that Bayesian methods havenāt caught on.
Still, Iām going outline an approach Iāve used to operationalize ālarge effectā in an epidemiologic context. To start, I want a prior not too skeptical or informative, as I would not want to obscure potentially important relationships or overweight prior information from āexpertsā (who in my experience are often grossly overconfident about effect sizes based on rather weak previous evidence). On the other hand, I usually want to shrink down the sort of inflated associations that get highlighted by selecting on āsignificanceā (an inflation that gets worse as the cutoff is dropped, as when multiple-comparison adjustments are used). Focusing on the epidemiology of rare (and thus hard to study) diseases like cancers, I have noticed that effects get called ālargeā when the OR is outside the range of about Ā¼ to 4. In the rare-disease case I would thus compare the one-sided P-value for ORā¤1 to the posterior probability of ORā¤1 under some sensible if neutral prior, e.g., ālarge effect unlikelyā represented by using a prior on Ī² = ln(OR) symmetric about 0 that produce a 95% prior interval for the OR=exp(Ī²) of (Ā¼,4). To gauge the impact of the prior, I would also compare the resulting 95% posterior interval to the 95% confidence interval.
That leaves the shape of the prior to be determined. A familiar form that yields the desired 95% prior interval of (Ā¼,4) for OR is the normal(0,Ā½) distribution for Ī² (lognormal for OR). This choice also places about 2:1 prior odds on the OR interval (Ā½,2). Nonetheless, a choice I prefer over the normal for more elegant computation and connection to prior information in logistic regression is the conjugate prior for Ī², the log-F distribution (F for OR). Equating the numerator and denominator degrees of freedom produces a log-F(m,m) prior for Ī², which like the normal is unimodal and symmetric around its mode; and the log-F(9,9) prior distribution produces a 95% prior interval for OR of (Ā¼,4).
Since the log-F(m,m) is rather unfamiliar, as an aside I give some details: Although it has heavier tails than the normal, it approaches normality rapidly as m increases, and for mā„9 produces results negligibly different from a normal with the same 95% central interval. The shared degrees of freedom m is the number of independent Bernoulli trials with parameter Ļ = expit(Ī²) = exp(Ī²)/(1+exp(Ī²)) = OR/(1+OR) needed to encode the information about Ī² in the prior. This relation enables the user and reader to grasp the evidential strength of the prior in terms of observing m tosses from a coin-tossing mechanism. For example, specifying a 95% prior interval for OR of (Ā¼,4) is claiming to have prior information on OR = Ļ/(1āĻ) equivalent to the information on the odds of heads vs. tails obtained from observing 4 or 5 heads in 9 independent tosses. Given the log-F convergence to normality, the same interpretation could be given to the normal(0,Ā½) prior. For more about encoding prior information and computation with log-F priors see Greenland 2007, āPrior data for non-normal priors,ā Statistics in Medicine, 26, 3578-3590.
OK, so thereās my working answer to Frankās suggestion for a skeptical prior, at least for a ārareā disease. Applying the normal(0,Ā½) for Ī² to Danās 95% confidence interval for the OR of (0.96, 3.02) yields an approximate posterior mean and 95% interval for OR of 1.58 and (0.93, 2.68), with a posterior probability (āBayesian P-valueā) for ORā¤1 of 0.046 (those numbers were computed using inverse-variance-weighting as per Ch. 18 of Modern Epidemiology 3rd ed. 2008 - and no, I did not cook them to keep the probability of ORā¤1 below 0.05).
However, Danās example involved an outcome which may not be rare. For common (and thus more precisely estimated) outcomes like myocardial infarction ālargeā tends to mean outside the range of about Ā½ to 2. So I expect a genuinely informed skeptical prior for the OR in his application would need to be much narrower than normal(0,Ā½), although how much narrower depends on contextual details which were not given. Finally, due in part to the counterintuitive behavior of odds ratios for common outcomes (including noncollapsibility), I would not have reported an OR as the final estimate ā I might have instead reported standardized risk ratios or differences (āmarginal effectsā) derived from a logistic model.