What are credible priors and what are skeptical priors?

Sander this is very clear, well set up, and convincing. All I can add is a bit more context.

In my view we need to be pre-specifying primary study analyses that are Bayesian, and to settle the choice of prior (whether skeptical or previous data- or knowledge-based) before the choice could possibly be influenced by the results, or be manipulated by unscrupulous investigators. That being said, there is great value in Bayesian interpretation of already-completed results. Since for this setting it is not possible to find an already pre-specified prior, a skeptical prior is often fitting. I can think of two modes for choosing such priors:

  • Finding the ultimate judges of the research and elicit skeptical priors from them, or
  • Do as Sander wrote and select a reasonable skeptical prior that is likely to either convince most skeptics or to just make large effects unlikely. The latter assumption is very plausible in most areas of research. Even if the observed effect was equivalent to a “cure” (effect ratio of 0.0) the Bayesian posterior median effect would remain very impressive under such a skeptical prior.

The second option is more feasible. As with Sander, I take ‘skeptical’ to mean equal probability of harm as benefit. More specifically I take the prior probability that the effect ratio < r to be equal to the prior probability that it is > 1/r. I often simplify this to solving for the variance of a normal prior distribution for the effect log ratio, but Sander advocates a more flexible F-distribution-based approach, which I also like.

What would you suggest for priors in a situation where an effect is much more likely in one direction? This is fairly common but I don’t recall ever seeing it addressed. Examples might be things like therapist interventions for back pain, where it’s pretty unlikely to make the pain worse, but could plausibly do very little. Another example might be effects of antibiotics on infection - they almost certainly won’t make it worse but might not do enough (or have other unintended effects) to make them worthwhile.

1 Like

Two parts to my reply:

  1. First, your examples show how priors can be controversial and very dangerous. Some physical therapies can unexpectedly worsen injuries and hence long-term pain. For decades physicians prescribed low-fiber diets for bowel problems until evidence accumulated they were doing more harm than good. Antibiotics have often been given for what turned out to be infections by a virus, protist, or resistant bacterium, in which case the antibiotic can worsen illness by destroying competing, nonresistant bacteria and delaying initiation of effective treatment (not to mention producing direct adverse side effects, as with fluoroquinolones). The point is that use of asymmetric priors could delay recognition of costly effects in the unexpected direction.
  2. Still, there are special cases in which asymmetric priors seem reasonable and safe, e.g., where missing the unexpected direction entails no loss. I gave a detailed example in 2003 (“Generalized conjugate priors for Bayesian analysis of risk and survival regressions,” Biometrics 59, 92-99) using a generalized-conjugate prior for logistic regression, with graphs of the prior. That prior reduces to the generalized (location shifted and rescaled) log-F(m,n) in the univariate case covered in my 2007 SIM article. The log-F allows skewness by using unequal degrees of freedom, with ln(m/n) functioning as a natural skewness parameter. See also Appendix 1 of Greenland 2009, “Bayesian perspectives for epidemiologic research. III. Bias analysis via missing-data methods,” International Journal of Epidemiology 38, 1662-1673 (corrigendum in IJE 2010, 39, 1116).

Would a meta-epidemiological approach be admissible in this case? Say analyzing a large data-set of previous RCT results and using this to inform what is the appropriate prior probability of different effect sizes? Could include covariates for disease area, outcome type etc.

I would say a meta-epidemiological approach makes some sense to me, but of course subject to all of the challenges that come with quantitative evidence synthesis (study quality, credible designs for causal effects, heterogeneity, etc). I have tried to take this approach in a recent paper (under review) that used prediction intervals from a random effects meta-analysis a starting place for thinking about the prior, with of course some sensitivity analyses for other priors or shapes. I’d be curious to hear if folks strongly objected to this kind of approach.

Nice post @Sander . Although it is somewhat tangential to the title of this thread, I want to highlight some recent work at


that I think will become important to specifying priors, at least in a language like Stan that is not concerned with conjugacy or anything like that.

The essence of this research is to use “quantile parameterized distributions”, which are essentially distributions whose parameters are quantiles. So, if you can specify or elicit a prior median for a parameter and at least two or three other quantiles, then it is possible to construct a probability distribution that has those quantiles. In the case of an unbounded distribution like that for a regression coefficient, there are a couple of quantile parameterized distributions, namely the simple q-normal and the megalog(istic). In the case of a distribution that is bounded from above and / or below, there are some good choices based on the Johnson system.

Anyway, for a regression coefficient, one would often set the prior median to be zero and then would need to set a few more quantiles based on what you think is the minimum value for a large effect. I’ll be talking about this idea and Stan a bit more Saturday at the R/Medicine conference if anyone is interested.


Good question, which at the moment I can only answer with another question: If you are going to make such a meta-analytic prior, why not just create a meta-analysis of the entire body of literature including the current study? That might take no more effort and could be a more informative contribution to the literature.

A related notion is the idea that, for reporting, a frequentist analysis should take precedence over a Bayesian analysis by supplying a summary easily integrated into research syntheses and meta-analyses (see for example Stephen Senn’s writings on Bayes). After all, if we were doing a synthesis would we really want to combine Bayesian study results contaminated by and correlated through the priors used by each research group? I wouldn’t want to, with the exception that some reference Bayes results are actually better frequentist results than the usual ones (I am thinking especially of the 2nd-order bias correction created by Jeffreys priors).

Triggered by Dan’s frustration, Sander provides valuable guidance on specifying a Bayesian analysis for the Odds Ratio (I get back to it below). However, my big point isn’t choice among Bayes/frequentist/fiducialist/whateverist, or prior specification; it’s that journals (or other institutions) are not willing to publish or post evidence that isn’t ‘definitive’ by some rather arbitrary definition. Failure to publish/post deprives the science and policy communities of valuable information in its own right and as input to a synthesis. Of course, a study needs to satisfy standards regarding design, conduct and analysis, but if it does so, results should be made available by posting the data or, at minimum, the likelihood based on a clearly communicated model. This proposal is a golden oldie, and for me is the dominant `ask’ generated by Dan’s frustration.

Now, back to Sander’s analysis. I support the idea of moving away from default priors, paying attention to the prior probability content of parameter regions including those for which (almost) all would agree are unlikely. Probability content is key, low information associated with a big variance can be deceptive. For a frequently-cited example, the probability below 1.0 for a Gamma distribution with mean = 1.0, variance = 10,000 is 0.999, hardly uninformative. Taking `prior’ seriously, we should engage as much as possible in a priori, protocol-based prior elicitation (watch for Tony O’Hagan’s forthcoming article in The American Statistician, “Expert Knowledge Elicitation: Subjective, but Scientific”). Elicitation can be effective for low dimension parameters, but we likely need to rely on generic advice for some components of high-dimensional parameters, surely for their full joint distribution. For effective elicitation, transform the parameter space to create a low dimension subspace of interpretable parameters, likely in prediction space (Sander’s standardized risk ratio or risk difference are good examples), with the likely need to use defaults on the complementary subspace.

Let’s keep the conversation going!


The idea is to specify a reasonable prior (for active treatment A vs placebo) if no relative effect trial has been done before. If exchangeability can be assumed then can I use the predictive distribution derived from an appropriate set of historical trials (of active treatments vs placebo) to produce a informative prior for A vs placebo?

Perhaps I am not understanding your question as it raises for me yet another question: What is the basis for assuming exchangeability of effects with (other) active treatments? How is “the appropriate set of historical trials” and the treatments they examine determined? Aspirin for headaches? Penicillin for pneumonia? I don’t see enough detail for justifying a prior in the description so far… It’s hard to create credible priors because they need such details spelled out, along with a derivation of the prior from those details (where presumably the devil dwells). [For those who remember Anthony Quinn’s parting lines to Omar Sharif in Lawrence of Arabia, a paraphrase: “Being a credible Bayesian will be thornier than thou suppose!”].

1 Like

Thanks for all your input @Sander - really appreciate it.

Maybe more detail would help! I am doing a PhD to develop methods to help research bodies prioritize research proposals. To estimate the value from a proposed RCT am using a method from decision theory called value of information. This requires a Bayesian prior for the treatment effect for every trial proposal. This is an issue as in many cases there will have been no previous randomized trial with which to inform the uncertainty and eliciting quantitative expert opinion could be problematic (for a number of reasons).

What I want to do is to use off the shelf priors as a best guess on the range of plausible effect sizes from a set of historical studies. The historical set would be the entire set results from all studies funded by the research funding agency (all good quality and pre-registered) - could also adjust this for disease area, active/passive comparison etc. Constructing the prior would indeed involve combining previous studies from a wide variety of areas such as aspirin for headaches and penicillin for pneumonia. The idea is that if there is no knowledge about the effect of the current treatment then the best guess would be the effects seen for previous treatments. (there is an issue with selection here if the studies which were funded in the historical set were funded because they were expected to work and this expectation was correct)

There is currently very little quantitative consideration when deciding which studies to prioritize over others so my aim is to provide a quantitative starting point for discussion.

Any thoughts would be very much appreciated.

1 Like

If you have historical studies, then using one of the Quantile Parameterized Distributions that I alluded to above

is just a question of using the empirical CDF of the historical estimates to pin down the quantile parameters.

OK DGlynn, thanks for the details.
I have to challenge some elements of your description and tentative proposal:

  1. “No knowledge about the effect of the current treatment”: That’s not credible. No one would come forward with an RCT proposal nor should it be seriously entertained by an IRB without enough theory and evidence that there should be a beneficial effect far outweighing potential harms, including biochemical, in vitro and animal studies and uncontrolled human studies, perhaps even up through phase I trials…
    The question is then how to use that information in forming a prior for the effect, which is hard.

  2. “Wide variety of areas” makes no sense to me. I would find nothing credible about drawing analogies about aspirin for headache or penicillin for pneumonia. Effects in other studies would be exchangeable with the proposed study effect only to the extent their treatments and outcome variables resembled those in the proposal. One can attempt to model the degree of exchangeability via a 2nd-stage regression (e.g., as done with foods regressed on nutrients in Witte et al. 2000. Multilevel modeling in epidemiology with GLIMMIX. Epidemiology, 11, 684-688) but this takes immersion in the biochemistry of the exposure and the physiology of the outcome.

  3. As you note, drawing from funded studies would induce an optimistic bias. Going to the broader literature there is also optimistic publication bias (failure to publish or even accessibly archive negative studies).

These 3 considerations are far from sufficient to form a prior, but I think they are necessary to form a credible prior in your example.
They enter into a narrative approach which systematically discusses the evidence that falls under 1-3: the direct background evidence as in 1, the indirect (partially exchangeable) evidence as in 2, and the evidence about biases in the direct and indirect evidence as in 3. That approach is I think an essential prerequisite to forming a contextually credible prior, because it helps us discern what such a prior should look like in vague outline.

In my experiences (using considerations like 1-3 when doing Bayesian risk analyses) the predictive improvement over the narrative that one gets from a quantitative analysis is often not enough to justify the labor, reduced transparency, and unavoidable dependence on arbitrary assumptions of the quantitative analysis. Worse, from what I’ve seen the priors can do more harm than good, for example by introducing biases from assumptions of convenience that are contextually absurd but unrecognized as such because of their mathematical formulation (common when no one correctly connected the math to the contextual information). Classic cases include Bayesian analyses that assume prior independence of rates across categories, or that place spikes at the null when there is no evidence supporting the null or indicating it should receive special weight.

Sorry if all that sounds discouraging, but I do believe noncredible Bayesian inferences and decisions can do worse than traditional inferences and decisions (narratives and judgments bolstered by simple frequentist statistics - which of course can be awful if done badly). I think the quality of both cases hinges on the quality of the contextual narrative, including the validity of the connection between the context and the model parameters and priors.


This is very interesting. The quality of this forum is unparalleled.

I have been reading through Statistical Rethinking lately and exploring Bayesian methods.

Skeptical priors seem most defensible and reflect how the medical community already interprets frequentist results. Beyond that I am worried that the contextual narrative is of very questionable validity in much of medicine and anyone trying to defend a prior other than a skeptical prior will see little support for their particular choice.

1 Like

Excellent thoughts, again. In the spirit of Andrew Gelman’s description of “type S errors” (getting the wrong direction for the treatment effect) I do want to suggest that having priors that merely exclude impossible or highly improbable values (e.g., a previously undiscovered risk factor having an odds ratio > 5) is usually better than putting no restriction on the priors, as done by frequentist inference and by Bayesian methods that use uninformative priors. (This is consistent with your initial post).

That sounds very interesting - do you have a shareable copy that I could see?

Many thanks @Sander , not the response I was hoping for but I really appreciate the time and thought put into the answer.

I don’t know enough about what has already been done here, but it seems timely-- given the growing recognition that p-values are problematic but without popular solutions being put forth-- to reanalyze the results of a couple of hundred trials with a skeptical prior.

Dear Pavel,
I offer the following response from the perspective of an applied statistician whose main tasks have been analyzing data and writing up the results for research on focused problems in health and medical science (as opposed to, say, a data miner at Google):

Contextual narratives I see in areas I can judge are often of very questionable validity. So are most frequentist and Bayesian analyses I see in those areas. Bayesian methods are often touted as a savior, but only because they have been used so infrequently in the past that their defects are not yet as glaring as the defects of the others (except to me and those who have seen horrific Bayesian inferences emanating from leading statisticians). Bayesian methodologies do provide useful analytic tools and valuable perspectives on statistics, but that’s all they do - they don’t prevent or cure the worst problems.

All the methods rely rather naively on the sterling objectivity, good intent, and skills of the producer. Hence none of these approaches have serious safeguards against the main threats to validity such as incompetence and cognitive biases such as overconfidence, confirmation bias, wish bias, bandwagon bias, and oversimplification of complexities - often fueled by conflicts of interest (which are often unstated and unrecognized, as in studies of drug side effects done by those who have prescribed the drug routinely).

To some extent each approach can reveal deficiencies in the other and that’s why I advocate doing them all in tasks with severe enough error consequences to warrant that much labor. I simply hold that it is unlikely one will produce a decent statistical analysis (whether frequentist, Bayesian, or whatever) without first having done a good narrative analysis for oneself - and that means having read the contextual literature for yourself, not just trusting the narrative or elicitations from experts. The latter are not only cognitively biased, but they are often based on taking at face value the conclusions of papers in which those conclusions are not in fact supported by the data. So one needs to get to the point that one could write up a credible introduction and background for a contextual paper (not just a methodologic demonstration, as in a stat journal).

Statistics textbooks I know of don’t cover any of this seriously (I’d like to know of any that do) but instead focus all serious effort on the math. I’m as guilty of that as anyone, and understand it happens because it’s way easier to write and teach about neat math than messy context. To remedy that problem without getting very context-specific, and what I think is most needed and neglected among general tools for a competent data analyst, is an explicit systematic approach to dealing with human biases at all stages of research (from planning to reviews and reporting), rather than relying on blind trust of “experts” and authors (the “choirboy” assumption). That’s an incredibly tough task however, which is only partially addressed in research audits - those need to include analysis audits.
It’s far harder than anything in math stat - in fact I hold that applied stat is far harder than math stat, and the dominant status afforded the latter in the statistics is completely unjustified (especially in light of some of the contextually awful analyses in the health and med literature on which leading math statisticians appear). That’s hardly a new thought: Both Box and Cox expressed that view back in the last century, albeit in a restrained British way, e.g., see Box, Comment, Statistical Science 1990, 5, 448-449.

So as a consequence, I advocate that basic stat training should devote as much time to cognitive bias as to statistical formalisms, e.g., see my article from last year: “The need for cognitive science in methodology,” American Journal of Epidemiology , 186, 639–645, available as a free download at https://doi.org/10.1093/aje/kwx259.
That’s in addition to my previous advice to devote roughly equal time to frequentist and Bayesian perspectives on formal (computational) statistics.


Hi @Sander I came across this post from reading up on Magnitude based decisions which has been described as a statistical cult

I went to the author of the method (MBD) and he is using this Q and A as sort of a justification for MBD

After reading both, maybe I am misunderstanding, but I don’t see how the comments here justify MBD

1 Like