Absolute risk measures from count models



Dear all:

I’m new to this list – whose ideators and moderators I wish to sincerely thank for setting up and running – so apologies in advance if cross-posting.

I’m member of a team of statisticians and data managers carrying out comparative effectiveness evaluations of health-care transformation programmes in England. I’ve recently been tasked with deriving measures of absolute risk – notably, absolute risk differences, as described in e.g. Austin (JCE 2010, DOI: https://doi.org/10.1016/j.jclinepi.2008.11.004) – from a GLM Poisson log-link model of emergency hospital admissions fitted to an array of covariates including a binary exposure indicator.

While I think I got a decent understanding on how to go about such task (both conceptually and operationally) for a logistic regression model, I’m unconvinced absolute risk differences make sense from the above mentioned Poisson regression set-up, which models rate-type parameters and produces inferences on relative risks. Absolute risk measures seem to be only extracted from models of probability-type parameters (i.e. logistic regression or survival models); at the same time, researching Google for statistical publications reporting absolute risk measures from Poisson regression turned up nothing except (numerical and conceptual) warnings about GLM models with Binomial errors and log-links.

I should be grateful for any insight more experienced statisticians in the field than me may be able to offer.

With many thanks in advance for your time, all the best,

Stefano Conti


This is an excellent question. I assume that you are modeling multiple admissions per patient. I’ve never fit a Poisson model in my entire career, so can’t speak to the interpretation and goodness of fit of these. I would say that a semiparametric model such as the proportional odds model has a better chance of fitting the data. Semiparametric models (which include the Cox proportional hazards model) allow for arbitrary clumping at zero and bizarre distributions elsewhere - clumping so extreme that a zero-inflated count model could never fit the data adequately.

Once you fit an ordinal model it is easy to get any probability estimates you need conditional on covariate settings, e.g. P(any admissions | X), P(>=1 admission | X), P(>= 2 admissions | X). You can also easily estimate the mean number of admissions given X.

Semiparametric models essentially encode the entire empirical cumulative distribution function of Y into their intercepts, so the distribution of Y for any setting of X is anything needed.


Dear Prof Harrell:

Many thanks for offering your insights.

I should have been a little clearer as to the data structure I’m handling: these are individual-level quarterly hospital admission (among other) counts; in other words they’re aggregated over the course of each quarter for each patient.

I’m assuming that a proportional odds model is the same as an ordered logistic regression; correct? If so, I’m not sure I’d be meeting the conditions to operate within this modelling framework in that I have no obvious upper bound on the number of emergency admissions to define the number of modelled odds. Unless I proceed by arbitrarily lumping individuals whose number of quarterly emergency admissions exceeds a given (necessarily arbitrary and likely data-dependent ) threshold…


The beauty of semiparametric models such as the proportional odds ordinal logistic model is that no grouping of any kind is needed. Counts can be open-ended on the high side. You’ll have k intercepts if there are k+1 distinct Y values. For your quarterly setup you might use a mixed effects ordinal model.


Sounds like a fun project and I look forward to seeing the results when they’re published. I’m certainly not a very experienced statistician - I’d definitely take Frank’s advice over my own - but I have done quite a bit of playing around with absolute measures derived from Poisson models.

The key issue is that the Poisson model is not estimating a relative risk, but instead a rate ratio. And hence, when we try to get an absolute difference, the easiest one to get is an absolute rate difference and not an absolute risk difference.** I suspect if you repeat your google search but substituting rate for risk then you might find more useful information.

The approach described in Austin 2010 for estimating marginal probabilities from a logistic model works for estimating marginal rates from a Poisson model (as long as we remember the different link function…)

Good luck convincing yourself that any differences you see are truly due to the health-care transformation programme! :slight_smile:

** you can put in a bit of work and calculate probability estimates such as Frank mentions, but I suspect proportional odds models would give an easier route to that probability if it is what you care about!

  1. Look at Aalen’s methods for recurrent events (note these models spit out time dependent ARR functions) . Then sample the process to get the ARR differences at various time points
  2. Fit the Poisson model with the identity link since this will give you absolute risk differences
  3. Fit the Poisson model under the log link . Once you fit the model run a post estimation prediction on a dataset composed of individuals that resemble your target cohort. You can do this counterfactually . This is probably best done from within a Bayesian SW (Bugs or Stan).


Dear all:

Many thanks for your follow-ups and considerations, and apologies for my belated acknowledgement.

This is all very useful information, which I’ll share with my colleagues involved with addressing the same quandary. Unfortunately a Bayesian approach, which I’d personally favour, won’t be feasible as it’s not the inference framework of use within the team; however that is conceptually irrelevant for the purposes of calculating absolute risk / rate measures.

All the best,