To answer Anders request, which I think points to a view seen at least by the 1970s, when the modern “causal inference” movement was gestating: Suppose we are dealing with a real-world problem about comparing and choosing interventions, rather than starting as in statistics books with a contextually unmotivated parametric model. In that case we could say we want a family of distribution functions f(y_x;z,u) that predict, as accurately as possible given the available information, what the outcome y of individual unit u will be when given different treatments x; z is the unit’s measured-covariate level and so part of the information available for the task. There are cites expressing that goal or functional target going back to the 1920s, allowing for the radical changes in language and notation since. By 1986 Robins had extended x and z to include longitudinal treatment regimes and covariate histories in the outcome function, and by 1990 fairly stable terminology and notations appeared.
In this view, causal estimation or prediction then comes down to fitting models for the potential-outcome distribution family f(y_x;z,u), not some regression analog E(y;x,z,u) - although under some “no-bias” or identification assumptions made all too casually, the PO family will functionally equal a regression family. The key point now however is that having the function f(y_x;z,u) would render the issue of effect measures pointless for any practical decision-making.
The problem I see with most effect-measure debates is that they lose sight of the original nonparametric target. No one should care about ratios f(y_1;z,u)/f(y_0;z,u), differences f(y_1;z,u)-f(y_0;z,u), or more complex contrasts among the f(y_x;z,u) such as odds ratios, except as conveniences under (always questionable) models in which they happen to reduce to parameters in the models, or to exponentiated parameters. These parametric models are simply artifices that we use to smooth down or reduce the dimensionality of the data and fill in the many blanks in our always-sparse information about f(y_x;z,u), a function which we never observe for more than one x per u.
Parametric smoothing can reduce variance at the cost of bias, a trade-off that varies quite a bit across target quantities. Background substantive theory (as Norris calls for) - or as I prefer to call it, contextual information - can help choose a smoothing model that does better with this tradeoff than the usual defaults. But one should not confuse minimizing loss for estimating a measure or model parameter with minimizing loss for estimating a practical target.
In this regard, a central point in Rothman et al. AJE 1980 can now be restated as this: Often, when the target outcome is risk (not log risk or log odds), the z-specific causal risk differences (cRD) are proportional to the estimated loss differences computed from f(y_x;z,u), which makes the cRD the most relevant summaries in those situations. This does not however justify additive-risk models for estimation, nor does it justify failing to compute cRD as a function of z, rather than as some (often absurd) “common risk difference”. Without solid contextual information to do better, smoothing will often best be based instead around shrinkage toward a loglinear model that will automatically obey logical range restrictions and have estimates that can be easily and accurately calibrated in typical settings.
To conclude: Effect summarization seems essential to communicate results, but also seems to mislead by being taken as the ultimate, proper, or only role for estimation. As a prime example, the effect-measure debate falls prey to replacing a difficult but core practical question - what is f(y_x;z,u)? - with mathematically more tractable oversimplifications and exclusive focus on pros and cons of effect summaries, such as contrasts of x-specific averages over the f(y_x;z,u). This focus fails to directly answer the original question or even recall that f(y_x;z,u) is what we ultimately need for rationally answering practical questions, like “what treatment x should we give to patient u or group u whose known characteristics are z?”