Who should define reasonable priors and how?

Hi!

Clinician and very amateur stats nerd here, trying to take in the BBR course one step at a time. I wanted to ask a question that was inspired by the discussions around the course but doesn’t directly relate to any chapter and therefore decided to open a separate thread.

My question is regarding the definition of reasonable/realistic priors for Bayesian analysis. What triggered me was the discussion on David Spiegelhalter’s blog between him and John Ioannidis (see here) regarding the Bayesian re-analysis of the ANDROMEDA study. Ioannidis argues that the skeptical priors used in the analysis were not skeptical enough. Thus the question arises, who could decide on what skeptical (or optimistic) enough is and more importantly, how?

The one who definitely cannot be expected to decide on this is the reader who in most cases lack both statistical and subject-matter knowledge to answer this. However, I got the impression that probably neither Ioannidis or Spiegelhalter are optimally suited for it either, unless they possess in-depth subject matter knowledge about septic shock, which they might. I guess the onus is on the journal editor to make sure priors, as other statistical assumptions are reasonable, however I feel that this fact might be one big impetus for journals to accept non-frequentist statistical methods. Still, I feel that maybe the opinion of an editor and 2-3 reviewers might still be called into question by other experts.

I am wondering about the possibility of constructing a Delphi-like process which could help subject matter experts reach consensus about reasonable priors. Professional societies would be, in my view optimally suited to deliver such consensus-based recommendations for prior distributions to be used in their field, as they do with other similar recommendations that are based on expert opinion. However I believe that these societies would need a sort of handbook or guideline as to how to conduct such a process. Maybe that could be something that the statistician community could contribute with? What do you think? Maybe this is something that already exists?

Best regards,
Áron Kerényi, MD PhD

5 Likes

As a physician with very limited understanding of these methods, I’m keen to hear responses to your question. My impression is that this is a key source of controversy when people discuss the potential for more widespread application of Bayesian approaches.

Might it ever become standard practice for studies using Bayesian analysis to present results using a range of priors (from skeptical through optimistic)? Authors would make their case a priori for why they believe a certain prior is most defensible and readers could decide whether they consider their argument to be convincing (ironically though, this step would likely require consideration of prior bodies of evidence which used frequentist interpretations…). Taking into account the possibility that there will be those who might disagree with their chosen prior, authors would then present study results using a spectrum of priors, allowing readers to see a corresponding spectrum of posterior probabilities (?) Might this might pre-empt some of the inevitable criticism from other “experts” who might disagree with the authors’ choice of prior?

Another interesting angle to this is the impact that a switch to Bayesian methods might have on clinical practice guidelines. Would there be a “committee” to decide which prior for a given study is most defensible, in order to advise clinicians whether results of the study should be practice-changing or not?

For my clinical trials I first work with my statisticians and all my clinical collaborators to come up with priors and utilities (for utility-based designs). Then, the protocol is orally presented to other experts from my department (genitourinary medical oncology), at which time we review all aspects. Then, it goes through scientific review at my institution where it is again reviewed by other oncologists, statisticians etc. We may also discuss these with industry, FDA, and other stakeholders.

There is a good article here on some challenges and considerations. David Spiegelhalter wrote a classic article on Bayesian trial designs with an excellent discussion on priors. I am also a big fan of this list.

See also here for details and resources on Bayesian clinical trials at our institution.

4 Likes

Hi Erin - I like the way you are thinking about this. Ioannidis is somewhat off base here because he is tacitly assuming that treatments don’t evolve over time. He wants to use a “for all time in the history of drug development in sepsis” prior. The prior must be tailored to what is currently known. Priors should be settled before the results are available, so re-analysis is always a little dangerous. But a formal process as you’ve described, even if done after the fact, can still be helpful. Or one can leave it all up to the reader by plotting y=posterior probability of efficacy vs. y=degree of skepticism of the prior (best described as prior P(efficacy > c) for some large c.

The best paper I’ve ever seen on prior elicitation is https://onlinelibrary.wiley.com/doi/abs/10.1002/pst.1854

One thing to feel good about: those who disdain Bayesian statistics are in effect using poorly documented procedures for turning data into conclusions, and are basing their decisions on the wrong metric - P(data|hypothesis) instead of P(effect|data).

8 Likes

The “reasonable prior” is closely related to the cost of a decision . I think when L.J. Savage wrote on Bayesian decision theory and the foundations of statistics, he argued that the loss function and the prior distribution cannot be separated in a rigorous way. If we want to come up with “reasonable” priors, we should think about the potential losses first.

That is the message I take from this summary of a talk he gave on the topic:

Savage, L.J (1958) Recent Tendencies in the Foundations of Statistics (link).

1 Like

Thank you all for the great replies! In the meantime i read through a related thread on credible and sensible priors (here) which was a much more technical discussion and I felt that the option to use expert input gets discounted often and pretty much straight away due to cognitive biases. The 2019 O’Hagan paper suggested by @Pavlos_Msaouel is also referenced there and I found it a fascinating and enlightening read which answered some of my questions directly (eg there is already a Delphi-type protocol for eliciting expert consensus about prior distributions, however the authors case study uses a different protocol). An important point made in the paper is that these predictions can almost never be tested against some “truth” and henceforth no gold standard really exists. But there are standardized and scientific ways to conduct such a process.

@Pavlos_Msaouel Your process seems sound and I guess that the priors you end up with wouldn’t differ all that much from the result of a formal elicitation as described by eg. O’Hagan. My point is that it still lacks external validity wherein other experts may call them into question, especially upon seeing your results (such as the Ioannidis example). Imagine if the Sepsis-3 task force, alongside the new definitions, published a consensus-based prior distribution for 28-day mortality for a novel intervention in septic shock after conducting such a formal elicitation process. Not only would Ioannidis’ comment be unfounded but it could also have provided support for the ANDROMEDA group to conduct a Bayesian analysis in the first place (and potentially “strong-armed” JAMA, as they would likely have had to accept a statistical analysis which was instigated by an earlier publication in their journal).

@ESMD The few published Bayesian studies I have seen have all used a range of priors, from skeptical to optimistic, which is of course desirable but not satisfactory, as evidenced by Ioannidis’ comment (skeptical is not skeptical enough) and readers will not be able to make a judgement call about that.
Your point on CPGs is very interesting. I initially only considered the impact such expert consensus distributions could have on the validity of later clinical trials, but it would indeed provide a good starting point for the next revision of the same guidelines, a basis for evaluating the studies published in the interim. And an added benefit of choosing one (or maximum a handful of) parameters (such as survival analysis until day 28) would be to make the studies more uniform and meta-analysis easier.

@f2harrell Thank you for your comment and the reference, fascinating to read about the use of the same protocol described by O’Hagan used in a large pharma company. Regarding your last comment, I might be wrong but I have the impression that people reading this forum already feel good about Bayesian stats and even the wider audience is growing more-and-more wary of the frequentist approached. I think that the challenge today is a tactical one: how to make Bayesian more available and easier to use for a wide audience. In addition to providing the masses with the much needed education in Bayesian methods, as you do with BBR, I think having consensus-based “anchor” priors available could also contribute towards that end.

@R_cubed Your bring up an excellent point, if I understand you correctly, ie. that in addition to the probability of a statement being true it is also important to consider the consequences of it being true. A common clinical scenario comes to mind when the physician needs to consider not only the most likely but also the most deadly potential diagnosis. Is this a good analogy to your point? Because in that case, I believe, it lends supports my argument that priors (and utility functions) should be defined by at least including subject matter experts.

2 Likes

I am not very convinced by Ioannidis’s counterpoint in that thread. It felt more like arguing for the sake of arguing. You can argue about the validity of anything in your model, whether frequentist or Bayesian, after your trial is completed. Example: proportional hazards in immunotherapy trials in oncology. A Bayesian approach can be advantageous in that you can prespecify parameters for aspects of your model that you are not sure whether you will need them or not. In the proportional hazards example one can use a prior for the proportional hazards assumption (time x treatment interaction effect).

I would gauge validity based on whether your inferences work or not in real life. For example, this trial (priors + model + data generated) gave us useful dose-finding information that has been externally validated in the sense that the doses we came up with in that trial are being efficiently used in subsequent phase 2 trials. There is always room for improvement of course and an updated design, using the same data structure and motivating example, is currently under peer review in a statistical journal.

1 Like

I confess to not having read that article of Savage’s but his advice flies in the face of everything I know about decision theory.

I pulled off Savage’s Foundations of Statistics from my shelf. I can’t find a direct quote, but both the chapter on elicitation of personal probabilities and personal utilities involved gambling games, so distinguishing one from another is a challenge.

He mentions in passing (p. 95):

Blockquote
It seems mystical to talk about moral worth [ie. utility] apart from probability…

A direct reference to the problem of separating personal probabilities from utilities can be found in:

Joseph B. Kadane & Robert L. Winkler (1988) Separating Probability Elicitation from Utilities, Journal of the American Statistical Association, 83:402, 357-363, DOI: 10.1080/01621459.1988.10478605

A follow up to the prior paper reports extending the results more generally. The last section of the paper states:

Blockquote
In so far as the economic implications are concerned, the failure of probability elicitation procedures implies a corresponding impossibility of reconstructing utility functions from asset demands by the method of revealed preference

I must have confused Savage with (the late) Herman Rubin (Purdue University), who wrote a technical report on the (non)-separability of the prior from the utility function. I’ve seen references to a similar version being formally published in 1987.

Rubin, Herman (1983) A weak system of axioms for ‘rational’ behavior and the non-separability of utility from prior. (pdf)

This discussion with Herman Rubin describes his attitude towards the relationship between loss and prior in the context of prior Bayesian Robustness:

Bock, Mary Ellen. Conversations with Herman Rubin. A Festschrift for Herman Rubin, 408–417, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2004. doi:10.1214/lnms/1196285408. (link)

Blockquote
Q. One of your strong, ongoing interests is prior Bayesian robustness. How do you describe it?
Rubin: One of the difficulties of Bayesian analysis is coming up with a good prior and loss function. (I have been saying for years that the prior and loss cannot be separated. The Carnegie Mellon school is doing some work on this now). When I talk about prior Bayesian robustness, I assume that one does not yet see the random observation X whose distribution depends upon the unknown state of nature. One considers a choice of priors for which one averages over the possible states of nature and over the possible observations. This is different from posterior Bayesian robustness in which one considers the the choice of priors given the random observation X … When I am faced with a choice of priors, all of which seem about the same to me then I am very concerned about the possible alternative consequences of applying either one if it is drastically wrong.

In the context of the question by the OP, I interpret this question of skeptical priors as inextricably related to the question: “How optimistic can I afford to be in assuming that the model is (approximately) correct?”

If I was pressed to defend a point of view, I’d say that there exist circumstances when subjective utility and probability can be usefully separated, but there also exist important cases where probability and utility are inseparable.

As these cases did not seem to bother Herman Rubin, I don’t think they should bother anyone concerned with applications. But it is good to be aware of them.

Addendum: additional citations to some work by Herman Rubin, as well as a paper that extends the results of Kadane and Winkler.

1 Like

Don’t see anything there to convince me to think of losses when specifying priors.

It is hard for me to imagine doing any experiment without considering the value of the information. Methodology scholars have long complained that experiments have been “under powered”, and that much clinical research is not useful [1].

IJ Good considered information as something he called a “quasi-utility.” [2, p.40-41] This was also seen by Lindley [3], Bernardo [4], Degroot [5], and others throughout the decades after Shannon wrote his classic on information theory.

In terms of what you had written here [6], I was thinking that reflection on the cost of information might help in setting “skeptical” priors. We want a “severe test” that Deborah Mayo would advocate, but we would also want the skeptic to shift his beliefs with the information budget we have available, as well as one consistent with background information.

It may very well be that our (informational) budget does not permit us to persuade the skeptic, in light of what is currently believed.

Objective Bayesians like E.T. Jaynes suggested Maximum Entropy priors. That reduces to a constrained optimization problem, according to [7].

I was thinking it might also be possible to work Bayes Theorem in reverse, and derive a prior by using Good’s method of Imaginary Results with only:

  1. Skeptic and Advocate’s point estimates,
  2. the Schwarz/Bayesian Information criterion discussed in [8], which they give as:
    S = log (pr(D|\hat{\theta}_1, H_1)) - log (pr(D|\hat{\theta}_2, H_2)) - \frac{1}{2}(d_1 - d_2)log(n)
  3. A maximum sample size,
  4. The amount of shift after seeing data that would be considered important. I’d take some fraction > 0.5 of the distance between the point estimate of the hypothetical skeptic and advocate.

The regions where one or the other party would want to stop the experiment, having been so surprised by the data that they no longer wish to spend more of their budget to obtain observations. These regions would get less weight as a percentage of the total sample size.
Those parts where the models intersect, would get more weight (more observations). Smooth out the histogram, and center it at 0, and there is the skeptic’s prior for the experiment.

(I have a bit more thinking to about this, but it strikes me as plausible).

This seems to be related to a Bayesian Power Analysis [9]. It is obvious to me that areas where power is either extremely low or extremely high should have less weight, but sections where neither Skeptic or Advocate would be terribly surprised, would get more weight, implying they would need more observations to be convinced the other was correct.

Economists and other scholars dispute the axioms that define subjective expected utility model all the time. I think it is clear that everyday clinical research is far from how “rational” actors would behave.

But ultimately it doesn’t matter, if we think of changes in utility as information (in the mathematical sense). Bayesian decision theory methods remain valid in a world with state-dependent utilities and “irrational” actors, so long as there some penalty for excess optimism (or skepticism) that will induce agents to change, given finite sample sizes.

  1. Ioannidis, John (2016) Why Most Clinical Research Is Not Useful
    link

  2. Good, IJ (1983) Good Thinking: Foundation of Probability and Its Applications
    link

  3. Lindley, David (1956) On a Measure of information Provided by an Experiment
    link

  4. Bernardo, Jose (1979) Expected Information as Expected Utility
    link

  5. DeGroot, Morris (1984) Changes in Utility as Information
    link

  6. Harrell, F. (2019) Bayesian Power: No Unobservables

  7. Sivia, DS; Skilling, J. (2006) Data Analysis: A Bayesian Tutorial
    link

  8. Robert E. Kass & Adrian E. Raftery (1995) Bayes Factors, Journal of the American Statistical Association, 90:430, 773-795, (link)

  9. Ubersax, John (2007) Bayesian Unconditional Power

2 Likes

Extremely informative post. But I’m still not prompted to work backwards, electing to setting priors using prior information or setting a skeptical prior with no prior information, just to convince a skeptic.

1 Like

I thought I’d bump this thread after finding a valuable discussion between Andrew Gelman and Sander Greenland on this precise issue of setting priors.

Specifying a prior distribution for a clinical trial: What would Sander Greenland do?
https://statmodeling.stat.columbia.edu/2008/02/05/specifying_a_pr_1/

Specifying a prior distribution for a clinical trial
https://statmodeling.stat.columbia.edu/2008/01/24/specifying_a_pr/

My posts in this thread were some speculations I had about how 2 honest Bayesian scientists might negotiate in order to derive an experiment that would settle a question of fact, that is in dispute.

In the real world information is not free. So it would be important for parties of the dispute to know how much it would cost to run this (hopefully) definitive experiment (ie expected number of observations), and if that new information is worth paying for. In the decision theory and economic perspectives, this is known as Value of Information.

This is the role I see for retrospective meta-analyses – constraining the family of possible informative priors in order to decide on what future experiments (broadly defined, also including observational studies) to perform, to settle disagreements on fact and to aid in selecting a policy.

In Greenland’s terminology, he would call them “covering priors”, a concept I think has substantial merit.

Relevant Discussions

A Ph.D dissertation by a recent Vanderbilt graduate who was supervised by both Dr. Harrell and Dr. Blume. It is likely that rational scientists, in fields with small samples (especially my field of rehabilitation and physical disabilities) would implement some sort of adaptive randomization to balance the need for efficiency with the need for robustness (ie credibility).

I don’t remember the dissertation going into actual details, but AFAIK, it shouldn’t be hard to prove this type of design is admissible for both parties.

Chipman, Jonathan Joseph (2019) Sequential Rematched Randomization and Adaptive Monitoring with the Second-Generation p-Value to Increase the Efficiency and Efficacy of Randomized Clinical Trials. Vanderbilt University, ProQuest Dissertations Publishing, 2019. 13900836.

2 Likes

Excellent thoughts.

More food for thought: What if we based sample size calculations on finding the smallest N such that the posterior probability of efficacy did not differ by more the 0.03 between a flat prior and a skeptical prior?

3 Likes

I am not a big fan of skeptical prior distributions as discussed here:http://eprints.whiterose.ac.uk/1564/1/o%27hagan.a1.pdf

1 Like

That good paper was already in my Bayesian priors file. Clearly when “real priors” are available they should be used. But I think it is also valuable to use skeptical priors when not, because of the psychology of needing to convince skeptics.