Counfounding and effect modification “in expectation” versus “in measure”

This long thread about odds ratios Should one derive risk difference from the odds ratio?contained a brief exchange about the idea of “random confounding.” After seeing experts debate this term online, I wondered whether the controversy might be fuelling confusion among students of epidemiology, statistics, and medicine.

Reading around this topic led me to the following article:

This publication seems important. Is the idea of confounding “in measure” widely recognized or accepted by statisticians, who seem to speak exclusively (?) about confounding “in expectation”? Does the epidemiologic notion of confounding “in measure” underlie historically common MD critical appraisal practices (e.g., scrutinizing Table 1 in RCTs for between-arm covariate imbalances)? At least some MDs were taught to do this by MD instructors who had an epidemiology background, only later to hear statisticians tell them, in the immortal words of Bob Newhart, to “Just Stop It !!”

Could some of the heated exchanges in the Odds Ratio thread stem from disagreement about the “in-distribution” versus “in-measure” conceptualization of confounding and effect modification?

Some notable excerpts from the article (with bolded and underlined text inserted by me for emphasis):

“This general notion of confounding or exchangeability can be defined both with respect to the distribution of potential outcomes and with respect to a specific measure. The distinction has been drawn before (Greenland et al., 1999)…

…A further distinction can be drawn between confounding “in expectation” and “realized” confounding (Fisher, 1935; Rothman, 1977; Greenland, 1990; Greenland et al., 1999). In a randomized trial the groups receiving the placebo and the treatment will be comparable in their potential outcomes on average over repeated experiments. However, for any given experiment, the particular randomization may result in chance imbalances due to the particular allocation. Such a scenario would be one in which there is no confounding “in expectation” but there is realized confounding for the particular experiment (conditional on the allocation). Some authors (Greenland et al., 1999; Greenland and Robins, 2009) prefer to restrict the use of “no confounding” to that that is realized; a number of authors (e.g. Rubin, 1991; Robins, 1992; Stone, 1993) use terms like “no confounding” to refer to that in expectation; here we will adopt the latter practice…

…We have seen that a distinction can be drawn between confounding in distribution and confounding in measure. A similar distinction can in fact also be drawn with regard to effect modification…

…More recently, the expression "effect- measure modification" (Rothman, 2002; Brumback and Berg, 2008) has been used in place of the expression, “effect modification.” This has arguably occurred for two reasons. First, as has often been pointed out (Miettinen, 1974; Rothman, 2002; Brumback and Berg, 2008; Rothman et al., 2008), there may be effect modification for one measure (e.g. the risk difference) but not for another (e.g. the risk ratio). Effect modification in measure is thus scale-dependent and the expression "effect- measure modification" makes this more explicit. Second, with observational data, control for confounding is often inadequate; the quantities we estimate from data may not reflect true causal effects. The expression "effect- measure modification" suggests only that our measures (which may not reflect causal effects) vary across strata of Q, rather than the effects themselves (which we may not be able to consistently estimate).

…Confounding and effect modification, as conceived in this paper, and in much of modern epidemiology are causal concepts: they relate to the distribution of counterfactual variables. In practice, however, statistical models are often used to reason about the presence or absence of confounding and effect modification.

…Third, the distinction between confounding in distribution versus measure becomes important when considering “collapsibility” approaches to confounding assessment i.e. in settings in which an investigator evaluates confounding by comparing an adjusted and unadjusted estimate. Greenland et al. (1999) showed that for the risk difference and the risk ratio scales, collapsibility follows from no-confounding and vice versa. However, this implication holds for confounding in measure, not confounding in distribution. One may have collapsibility on the risk difference scale and therefore conclude that a particular variable is not a confounder of the risk difference (conditional on the other covariates); however, this does not imply that the variable is not a confounder for the risk ratio; it might be necessary to make control for that variable in evaluating the risk ratio. Collapsibility of the risk difference implies no confounding in measure for the risk difference; collapsibility of the risk ratio implies no confounding in measure for the risk ratio; however, neither implies no confounding in distribution. One must be careful when changing scales - not only in assessing effect modification - but also when thinking about confounding.

Question: Do statisticians subscribe to the idea of confounding “in measure”? If not, why not? Do these differences of opinion (if they exist) relate to statisticians’ focus on RCTs (as compared with the observational study focus of epidemiologists)? And where do these disagreements leave students, who will struggle to reconcile the views of their epidemiology and statistics instructors?


Thank you for posting this important article, which has thoroughly informed my thinking about these issues.

Absence of confounding in measure is a weaker assumption than absence of confounding in distribution (when there is no confounding in distribution, there will not be any confounding in measure for any effect measure). At first glance, this makes confounding-in-measure appear like an attractive concept. Unfortunately, nobody has proposed a general model for data generating mechanisms, such that biological/mechanistic knowledge can be used to justify conditional unconfoundedness in measure, unless the same knowledge also implies unconfoundedness in distribution. For this reason, confounding in measure is rarely useful in practice.

Most of my disagreements with the causal inference crowd arise from the fact that most methodological work on effect heterogeneity now relies on “effect modification in distribution”; in parts because this allows them to re-use the toolkit that was developed to deal with confounding-in-distribution. I continue to believe that effect modification in measure is a more relevant concept than effect modification in distribution (which was in fact the position that Tyler took in that paper).

A lot of your posts hint at your impression that there is a significant difference in opinion between statisticians and epidemiologists about these methodological issues. I think it is important to point out that what you understand as the “epidemiological” point of view is better understood as a causally informed perspective on statistics that grew out of epidemiology (Robins, Greenland), computer science (Pearl), economics (Rubin) and philosophy (Glymour and others). This approach is currently the dominant paradigm in theoretical statistics because it “won” over the earlier paradigm in terms of statistics’ own criteria for evaluating methodological research. We are talking about an approach to methodology that is fully fleshed out in terms of its mathematics, one that adds necessary precision and subtlety to the conceptual foundations, and that was developed by some of the most creative and rigorous mathematically trained researchers of their generation. Most certainly, we are not talking about “epidemiology” in the sense of the poorly trained observational clinical researchers that you ran into in medical school.

Medical schools are still full of statistics instructors who were trained in the old paradigm of statistical research. They do not speak for statistics as a field, which has moved on and accepted causal inference as part of its foundation. Students can still learn important things from these instructors, as long as they limit themselves to only discussing randomized trials (which often do not require the added subtlety/complexity that causal inference adds to the mix). However, students should completely ignore anything these people have to say about causal inference concepts such as confounding, which these instructors quite simply do not understand (as evidenced clearly by discussion in this thread).

1 Like

Thanks for your input, as usual.

The sole purpose of this post is to flag mixed messages that students might be receiving from experts in different fields with regard to foundational topics. Some disagreement is healthy and maybe moves science forward in the longer run, but students are the ultimate losers in academic turf wars and clashes between theoretical and applied researchers. Imagine if neurosurgeons and neurologists didn’t share a common understanding of neurophysiology…

To set the record straight, I thought that my epidemiology instructors were great. They were keen and dedicated. And even if some of the things they taught might have been supplanted by newer ideas, they managed to instil in at least one of their students a keen interest in the field (a pretty good achievement in my books).

You point out that the current state of epidemiology is a lot different that it was 3 decades ago. I don’t doubt that this is the case, though I do remember thinking that my introductory epidemiology teaching from the mid-1990s wasn’t that different from the teaching I got in the mid 2010s. And the later teaching was primarily provided by active PhD epidemiologists…I suspect that I just didn’t get to study epidemiology in enough depth to appreciate how much it has changed. The extent to which these changes have impacted observational research that is being published today is a topic for another thread.

I don’t think I intended to claim that epidemiology as a discipline has changed much over the last three decades. Most work in applied epidemiology is conducted in exactly the same way, most PhD epidemiologists have not changed, most training programs are the same. I am not here to defend epidemiology as a field. It has significant problems, maybe more so than statistics.

However, the small subset of epidemiologists who argue about methodology on the internet (i.e. the people we are referring to in discussions such as this when we talk about differences between “statisticians” and “epidemiologists”) are not representative of the kind of epidemiology you would be exposed to “in the wild”, unless you spend time at one of a small but growing number of schools of public health that focus heavily on causal inference (eg HSPH, UNC, UC Berkeley)

When these people argue about methodology on the internet, they are for all intents and purposes wearing the “hat” of a statistician (albeit one whose perspective was informed by a tradition that grew out of epidemiology departments). They are not fighting a “turf war” between epidemiology and statistics, but rather a “civil war” within statistics. The goal of this discussion is to get consensus on theoretical issues; the only way to reach such consensus is by having an open discussion that brings out the best arguments from each perspective.

Notably, as far as I can tell, the causal inference perspective is “winning” this theoretical discussion. If you look at the top journals in biostatistics, it would be impossible to publish new theoretical work on observational data analysis, unless you work within the modern counterfactual framework for causal inference. In other words, the “epidemiologists” are the ones whose views on causality align with the consensus among theoretical biostatisticians in 2023 (notwithstanding the presence of a vocal rearguard of biostatisticians who simply never bothered to stay up to date on the theory of their own academic discipline).