Methods to calculate the posterior distribution of the absolute risk reduction

Hello. I have seen numerous examples of code used to calculate the posterior distribution of the absolute risk reduction. The simplest, using brms, seems to be having family=bernoulli(link=identity) when working with a nominal predicted value (outcome) and a single nominal predictor (treatment) ie formula = outcome ~ 1 + predictor. Does the parameter estimate and est error give you the posterior distribution of the ARR?

Thank you.

Covariate-specific absolute risk reduction (ARR) is obtained by subtracting risk evaluated at two covariate value (e.g., vary treatment but leave age constant). The model needs to be realistic, so a linear model is not appropriate here unless perhaps for the case where covariates never arise. Use a binary logistic model, and retrieve the posterior draws of all its parameters. For each posterior draw compute the two risks of interest, and subtract them. This leads to a posterior sample of the ARRs.


emmeans + brms + tidybayes is the perfect combo for this scenario: Logistic Regression: brms + emmeans + tidybayes · GitHub


Thank you for the example code. I ran it using my data and came up with essentially the same median and credible interval for the ARR as when I used brms and specified family=bernoulli(link=identity). Thank you for your reply!

Beware all ye who stray from canonical link functions (in RCTs anyways): [2107.07278] Covariate adjustment in randomised trials: canonical link functions protect against model mis-specification. I assume emmeans with brms is doing what @f2harrell suggests but that is definitely the best way (and helps you to think of all the other great posterior transforms/summaries you can do).

1 Like

I’m guess that you are referring to the average ARR. Since ARR needs to be covariate-specific I think you’ll find the identity link to be very problematic.


To take this question a little further, how would we change our approach when we have random effects (from e.g., logistic regression on longitudinal data)?
Should we:

  1. condition on a given value of the random effects? Like a ‘typical’ patient?
  2. average over the random effects (marginal ARR)?
  3. Something else?

I have read Prof Harrell’s critique of marginal effects in other contexts (which was very convincing) but not specifically in the mixed effects context. Very interested to hear peoples thoughts (let me know if this needs a separate post). Thanks!

Not to answer your question, but I believe in full conditioning on baseline information and only want to condition on subject-specific effects if I’m interested in estimating subject-specific outcomes. Otherwise I marginalize on subjects by directly modeling serial correlation structure without random effects.