Eliciting informative priors for coefficients in logistic regression while accounting for baseline risk


Suppose one has data \boldsymbol Y = \{ y_1, y_2, \ldots y_N \}, y_i \in \{0, 1\} and some covariates \boldsymbol X in the form of a N \times P matrix. Suppose further that the model to explain \boldsymbol Y using \boldsymbol X is a logistic regression. For instance, we could be interested in explaining the probability of death from a certain disease given some explanatory variable(s).
The questions here pertain to the elicitation of (marginal) prior distributions for the coefficients, \boldsymbol \beta by manipulating odds ratios (\operatorname{OR}), risk ratios (\operatorname{RR}) and the baseline prevalence, p_0.

The main idea here is that one has a data set of moderate size that one wishes to analyse while incorporating extensive knowledge about the baseline prevalence/risk and an informative guess about the maximum risk ratio of a given covariate.


For simplicity, let’s concentrate on elicitation for a single coefficient.
Assuming we know p_0, we can use the identity \beta = \log(\operatorname{OR}) to write the risk ratio, as \operatorname{RR} = \frac{\exp(\beta)}{(1-p_0) +p_0 \exp(\beta)}. Suppose we have a maximum postulated risk ratio \operatorname{RR}_m, where \operatorname{RR}_m < 1/p_0 . If we place a prior \pi_B(\beta) on the coefficient, we can truncate the prior at \beta_m = \log\left( \frac{p_0\operatorname{RR}_m - \operatorname{RR}_m}{p_0\operatorname{RR}_m - 1}\right), creating a new prior \pi_B^\ast(\beta) = \pi_B(\beta)/F_B(\beta_m), with F_B(x) = \int_0^x \pi_B(x)\, dx.
This prior distribution incorporates knowledge about p_0 and translates the informative guess about risk ratios to the coefficients. Notice that other probabilistic constraints can also be incorporated this way, for instance \operatorname{Pr}(\operatorname{RR} < 2) = 1/2 \to \operatorname{Pr}\left(\beta < \log\left( \frac{2p_0 - 2}{2p_0 - 1}\right)\right) = 1/2.


  • Has anyone seen this method of constructing informative priors for the coefficients in logistic regression?
  • Is this a good idea?

I understand a major criticism of this approach is that we seldom know the baseline risk. But I posit there are, indeed, situations where we have a pretty good idea and can use that information to our advantage when creating prior distributions that incorporate important physical constraints.

I’m not seeing the value of bringing RR into the equation and would rather stick with ORs. To the general question, I think it’s very difficult to bring absolute risk information into a model that has lots of predictors, especially as you expand them into splines to allow for nonlinearities. I’d tend to use a very wide prior on the intercept and to constrain coefficients through their relative effects, to be able to scale to larger problems. Dealing with nonlinearity would be hard though. You might put priors on the nonlinear terms that favor zero pretty heavily, and a prior on the linear coefficient such that the inter-quartile-range OR has a low chance of being larger than 5 or less than 1/5.

1 Like

Thanks for taking the time @f2harrell, I appreciate it. But I’ll have to disagree. I know you don’t much like RRs, as per this post, for instance, but I don’t think ORs are easy to interpret or any easier to interpret than RRs. The usual difficulty with RRs, knowing the baseline risk, is taken as given for this problem.

This is a good point. But in the simple case of a logistic regression, is it not the case that the non-linearity of the model is taken into account by the transformations I discuss in my original post?

I think the ‘easier to interpret’ idea is a false hope. Ultimately RRs have much more of an interpretation problem since you have to have a series of RRs when baseline risk varies greatly, whereas a single OR will do. ORs have a first-order interpretation problem (for non-statisticians). RRs have a zero-order interpretation problem for everyone due to (1) dependence on baseline risk and (2) dependence on the choice of the event reference category.

I don’t think you are accounting for nonlinearity on the log odds scale, which is why we use spline functions so often for covariates.