This topic of model-based risk and rate estimation was being covered by authors like Cornfield, Bishop and Fienberg as far back as the 1960s and some say traces back to Deming in the 1940s. By the 1990s there was a sizeable literature on it. A few points I advise taking from that literature:
- I am with Frank in that if you have a binomial (or Bernoulli) outcome then the statistically sensible approach is to fit a logistic model (possibly hierarchical, as with a prior or random effects) and then compute what is needed from the fitted values. The general idea is that if the model is just a smoothing or noise-reduction device as is almost always the case in health and med research (rather than an embodiment of physical laws) it’s best to use the most numerically stable and most rapidly converging form available in tested software (where “rapidly converging” means both in asymptotic behavior and in numerical fitting, which seem to go together). The biologic rationales for other models connect only weakly to the messy realities of epidemiologic data, and overlook how little can be learned about the actual biologic mechanisms from model fitting (which cannot substitute for getting more detailed data).
- An important counterpoint is that if study of effects is the goal, the fitted model should be rich with enough terms to capture relevant detail, rather than based on the usual misplaced parsimony in deleting terms with p>0.05 or something equally biasing. The point is to not smooth away potentially important data patterns. This is a huge controversial topic so that’s all I’ll say here, but see my 2006 article listed below.
- Often one should not stop at presenting exponentiated model coefficients (which usually estimate odds ratios or rate ratios, depending on the sampling design). The fitted logistic probabilities can be easily used to compute estimated risks, risk ratios, risk differences, attributable fractions etc. - whatever is called for by the study context. This is not a statistical choice, but one of topic relevance, e.g., if costs are proportional to risks then risks and their differences are more relevant than odds and their ratios.
In sum, I’ve been writing about this topic since 1979, as have many others before during and since. See p. 439-440 of Rothman et al. Modern Epidemiology 3rd ed. 2008 for a brief intro and a few citations. Here are some of my later articles (available from me if you can’t download them):
Greenland, S. (2004). Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. American Journal of Epidemiology, 160, 301-305. - This gives a general formulation with many citations to the literature on this topic up to 2004.
Greenland, S. (2004). Interval estimation by simulation as an alternative to and extension of confidence intervals. International Journal of Epidemiology, 33, 1389-1397. - This companion article discusses how to use simulation to get model-based CI for arbitrary measures rather than resorting to complex and usually unprogrammed analytic formulas or popular but often unstable and inaccurate nonparametric bootstrap-percentile methods.
Greenland, S. (2006). Smoothing observational data: a philosophy and implementation for the health sciences. International Statistical Review, 74, 31-46. - Gives a general theory for the approach I advise for these purposes.