Hello. I have seen numerous examples of code used to calculate the posterior distribution of the absolute risk reduction. The simplest, using brms, seems to be having family=bernoulli(link=identity) when working with a nominal predicted value (outcome) and a single nominal predictor (treatment) ie formula = outcome ~ 1 + predictor. Does the parameter estimate and est error give you the posterior distribution of the ARR?
Covariate-specific absolute risk reduction (ARR) is obtained by subtracting risk evaluated at two covariate value (e.g., vary treatment but leave age constant). The model needs to be realistic, so a linear model is not appropriate here unless perhaps for the case where covariates never arise. Use a binary logistic model, and retrieve the posterior draws of all its parameters. For each posterior draw compute the two risks of interest, and subtract them. This leads to a posterior sample of the ARRs.
Thank you for the example code. I ran it using my data and came up with essentially the same median and credible interval for the ARR as when I used brms and specified family=bernoulli(link=identity). Thank you for your reply!
Iām guess that you are referring to the average ARR. Since ARR needs to be covariate-specific I think youāll find the identity link to be very problematic.
To take this question a little further, how would we change our approach when we have random effects (from e.g., logistic regression on longitudinal data)?
Should we:
condition on a given value of the random effects? Like a ātypicalā patient?
average over the random effects (marginal ARR)?
Something else?
I have read Prof Harrellās critique of marginal effects in other contexts (which was very convincing) but not specifically in the mixed effects context. Very interested to hear peoples thoughts (let me know if this needs a separate post). Thanks!
Not to answer your question, but I believe in full conditioning on baseline information and only want to condition on subject-specific effects if Iām interested in estimating subject-specific outcomes. Otherwise I marginalize on subjects by directly modeling serial correlation structure without random effects.
Thank you again for the code you provided. I had 2 follow up questions.
The first, do you have a way of calculating the relative risk?
My second question is how to translate for my audience the prior probability distributions into a prior of the absolute risk reduction. For example if my skeptical prior for the logistic regression model was a normal distribution with a mean of 0 and SD of 0.4, my prior belief is that the absolute risk reduction was 0 but how do I relay what the SD translates to in terms of ARR (ie the prior distribution of the ARR has a mean of 0 and 95% of the distribution is between an ARR of -x% and x%).
Usually one places priors on the fundamental model parameters (\beta) then for each posterior draw and for each covariate setting of interest one computes the difference in two model estimates to get absolute risk reduction (likewise for RR) and computes the distribution of this over all posterior draws. Very easy to do. Using a model based on odds ratios to get RR and ARR is easy - see Avoiding One-Number Summaries of Treatment Effects for RCTs with Binary Outcomes | Statistical Thinking
Of note, this dataset doesnāt have any other covariates. In case your dataset has them, you must first generate posterior draws conditioning on covariates of interests, then calculate the relative risk based on those draws.
This is the question Iāve been stuck on for so long⦠Lets say we adjusted for sex and age in the model. We could then calculate the conditional risk difference for a male aged 60 years like so:
b1 %>%
emmeans(revpairwise ~ treatment, transform = "response",
at = list(age = 60, sex = "male"))
Alternatively, we could calculate a marginal risk difference like so (untested so excuse errors):
The former is obviously a conditional risk difference for a very particular subject, while the latter is marginal in that it has averaged over the characteristics of the sample. What I really struggle to understand is when we should prefer one over the other.
Neither seem satisfactory. The problem with approach (1) is that it seems arbitrary to use these particular characteristics, while (2) seems problematic because it averages over sample characteristics which may be wildly unrepresentative of the population we wish to ultimately treat. Guidance would be greatly appreciatedā¦
Marginal estimates cover up what is going on. For example in the NIH Remdesivir study the overall reduction in time to recovery quoted in the NEJM paper does not apply to anyone in the study since the difference varies so greatly over initial state (e.g., being on a ventilator at baseline). One can compute posterior distributions of differences for a series of covariate values. When treatment does not interact with covariates a beautiful result happens:
when quantifying evidence for any efficacy (e.g., P(ARR > 0 | X)) this posterior probability will be the same for all covariate settings
when quantifying evidence for non-trivial efficacy (e.g., P(ARR > 0.05 | X) the posterior probabilities will be covariate-dependent. For example, for a sicker patient at baseline you may see a higher absolute risk reduction than for a less sick patient.
Great question. One approach is to show summaries for āreference patientsā, as they did in this article
āwe primarily present average estimates and average treatment effects, secondarily supplemented with estimates calculated for three different representative reference patientsā
How should we specify the prior distribution for an absolute risk difference?
For example:
Group A prior is a Beta distribution (shape parameters: ⺠= 4, β = 13) which attributes the most (mode) credibility to an expected event proportion of 0.2.
Group B prior is a Beta distribution (shape parameters: ⺠= 12, β = 27) which attributes the most (mode) credibility to an expected event proportion of 0.3.
These priors directly influence the prior for the absolute risk difference between Group A and Group B. How would I derive the prior for the risk difference from the priors of these two risks.
Youād need to pick reference patients as @arthur_albuquerque described, as absolute risk difference is a dramatic function of baseline risk. We tend to not put priors on risk differences because we donāt know that much about them apriori. Instead it is more common to put a wide prior on a reference log odds and a narrower prior on a log odds ratio for comparing A and B.
Sometimes I want to just put a skeptical prior on the log odds ratio. Sometimes high-quality data can inform, depending on what you mean by high-quality.