Should one derive risk difference from the odds ratio?

Here is a story I pointed to earlier in this thread (the quote is clickable):

“For the quantification of the treatment effect and considering such a model, the discrepancy between odds ratio and relative risk may be related not only to the level of risk under control conditions, but also to the characteristics of the dose-effect relation and the amount of dose administered. In the proposed approach, OR may be considered as constant in the whole range of Rc, and depending only on the intrinsic characteristics of the treatment. Therefore, OR should be preferred rather than RR to summarize information on treatment efficacy”.

Look, Sander, I will read those papers right away and give you my thoughts on them soon.

But first, I want to point out that the scientific literature is incredibly expansive, and it would be completely impossible for anyone to read everything. Trying to be comprehensive in terms of reading all the old literature in the field would take all the joy out of learning; there is simply too much of it. Moreover, if the papers are sufficiently old, they are often written in what for all intents and purposes is a different scientific language, requiring significant mental effort to “translate”.

In my view, it is quite rational for any graduate student to assume, to a first approximation, that most important insights from old research will be reflected in field’s current conceptual foundations (as reflected in the methods sequence at grad school, the textbooks and the implicit assumptions that are made in papers published today)

If there seems to be some gap in the implicit current foundations of the field, there is so much exploratory joy in trying to bridge it. The scientific objective of the student should be to improve on the way the field currently thinks about how it does research, and by far the most important qualification for doing that, is having a full understanding of how the people who are doing research today think about the foundations.

There will be cases where someone stumbles upon an improvement that is known but not reflected in the current foundations. In those cases, the fact that everyone is ignoring it means that the insight needed to be rediscovered by someone who can argue for it again, using fresh language that better reflects how scientists communicate today.

Of course, we should always try to identify whether someone has thought of similar things before. Both in order to give them credit, and also to find out whether there was an obvious counterargument which led to people not accepting the idea. But doing this is often hard. Scientific language is very rooted both in a particular time and in a particular academic field, and it may take several hours to evaluate an old paper just to conclude that it isn’t relevant. Even finding the right papers can be challenging, since there isn’t always an obvious search term.

Moreover, if an idea is closely related to some old research, it seems likely that a reviewer (or forum discussion participant) will at some stage point it out. This saves a lot of time, since one can then read only the ones that are likely relevant, and not all the hundreds of papers that could potentially be relevant.

It therefore isn’t at all obvious to me that reading the old literature is generally the best use of time.

1 Like

No one said you should read all the old literature, which would be impossible (and also beg what is meant by “old” - does that include Greenland & Robins mid-1980s, when you were presumably in diapers?). Even keeping up with potentially relevant current literature is impractical. So for now I simply advise that you don’t keep making assumptions about what hasn’t been done if you haven’t researched the past thoroughly.

But having a historical bent, I found reading old literature (which I did starting from when I was a student) confirmed something else Kuhn pointed out: Much of what we were taught as “established” by then-current standard texts was distortive of reality, representing more what had currently triumphed in certain academic circles, rather than what was best for the problems we faced. Furthermore, there were (and still are) a lot of amazing ideas that had gotten overlooked and were worthy of resurrection.

Genetics is replete with examples: “genes don’t jump” and “the only heritable traits are passed through genes” were both implicit dogmas taught as reality when I was a student in the early 1970s. Statistics is even worse: Despite objections stretching back a lifetime, it’s still plagued by the idea that every analysis should start by assuming the null and cling to that unless and until evidence overwhelms that dogma (but as Jeffreys and Fisher showed a statistician needn’t ever submit to overwhelming evidence - they need only keep inventing ever more clever explanations for that contravening evidence).

3 Likes

Yours is not a story about causal mechanisms related to a question. Statistics are there to help us answer a causal question about how the universe works. There is a true data generating process and we just trying to understand it. For me, the DGP exists, and I happen to have a question about a particular aspect of it. Please start with your question. Then tell me the part of the universe’s data generating process that is relevant to that question. The story of the data generating process should be in English, like mine was with substrates and enzymes, or with DNA damage and repair. If you can’t do this, then i won’t see it as a causal question. I won’t speak for others, but for me, problem of the OR is for causal questions, not predictive/descriptive questions.

I am still thinking about some of the papers that you suggested, and I will get back to you about about multistage carcinogenesis after I have processed it a little bit more.

But I want to thank you sincerely for pointing my attention to Weinberg (AJE 1986). I wrote in my newsletter (linked above) that Sheps’ insights have been repeatedly rediscovered independently across several academic fields. This is another instance of this, one which I was previously not aware of.

Weinberg suggests using models that are additive in log risk (risk ratio) when the risk factor is “preventive”, models that are additive in log of the complementary risk (survival ratio) when the risk factor is causal, and that the survival ratio can be approximated by the risk difference with a rare-disease assumption.

This is exactly what I have been struggling to convince the epidemiologic community about for the last 6 years, so it is surprising to see the same point being made in AJE in 1986 by a highly regarded and influential epidemiologist.

While I am still open to the possibility of being proven wrong by other papers, Weinberg (1986) is not a counterexample to my claim that there is no plausible data generating mechanism that leads to stability of the risk difference (except when intervention increases risk and the risk difference it is an approximation of the survival ratio) or of the risk ratio (unless intervention reduces risk and other conditions are met)

1 Like

I read several of your “old” papers in Murray Mittleman’s course Epi247 (otherwise known as Epi 24/7). They are excellent papers which taught me a lot, both in terms of the content of the paper itself, and also in terms of how to write convincing methods papers (which is something I still have a lot to learn about).

But in general, I find that it it is often easier to learn these ideas from secondary sources than from the original papers. Methods papers are optimized for convincing skeptical reviewers, for precision and for completeness of the argument. This means it isn’t really possible to focus fully on the insight that might make the idea “click” in the head of a student.

Papers which have been around for more than 10-20 years have had a chance to percolate through the academic zeitgeist, they have been processed by several thinkers who may have simplified the central insights and how they can best induce the “click” in their students. This speeds up learning, and makes it greatly more enjoyable.

I wouldn’t try to learn calculus from Newton or Leibniz’ original writings, and likewise, I wouldn’t try to learn causal inference from Robins (1986).

1 Like

Good you finally read Weinberg. Still, you love to jump to conclusions before studying too much. Please read p. 74-83 of Modern Epidemiology 3 before shooting off a reply again.

Once again you make up straw-man directives. No one said you needed to initially learn from old sources or go back in history to primordial beginnings (Newton’s era is sometimes presented as the origin of epidemiology, with studies by Graunt, Halley and Petty on mortality). Although I will say those that do go back to origins often see (as I did) how smart and nuanced some of the founders were, sometimes in ways that have been lost over time. For me this was especially so in statistics, with “An Essay towards solving a Problem in the Doctrine of Chances” by Thomas Bayes (posthumous 1764).

Again, as you saw with Sheps long before you were born (and not long after I was) and now with Weinberg, there is plenty of importance much more recently that is worth revisiting. Which reminds me, regarding Sheps whatever became of your reading of Khoury et al.? (“On the measurement of susceptibility in epidemiologic studies”, Am J Epidemiol 1989;129:183–190).

My apologies for posting this twice, I accidentally hit “post” on the last version before I finished writing it.

(1) Khoury et al (1989) contains the same basic argument for the case where an intervention increases risk, but as far as I can tell, does not consider the situation where the intervention decreases risk. I have added it as a reference in our paper and will add it to the newsletter list of others who have considered closely related ideas

(2) Modern Epidemiology contains a lot of good, correct discussion about the relationship between response types and effect measure stability. This is fundamental to the argument structure of the points I am making.

On page 78, there is an argument that when we are considering two interacting exposures, the absence of “interaction types” implies stability of the risk difference. This is a strong assumption, but I will acknowledge that it works in theory.

The first thing I asked myself after seeing this, is why it doesn’t lead to a paradox in situations where effect stability would lead to risks above 100%. I think the answer to that, is that it is set up for causal interaction instead of effect modification. That setup factors out the possibility that risk due to background factors varies between groups; any baseline risk difference is due to the causal effect of one of the interacting factors. I am therefore not convinced that this result would allow you to establish stability of an additive parameter between groups whose baseline risk differs due to background factors.

I realize I never explicitly said that this was my goal, so I can see that it may have been reasonable to interpret me as making a broader claim than what I had intended. I would like to note that for the purpose of the generalizability problems that motivate me, what matters is effect modification rather than causal interaction: Doctors generally cannot intervene on the observable markers of background risk that they use to guide their decisions.

Well how about you try and convince yourself one way or another by plugging some numbers in for the proportions of noninteraction types allowed (which have only one constraint: they must to sum to 1) to get the risks from the formulas on p. 77 of ME3 and see if you can vary the baseline risk (I presume you mean R00) while holding the RDs constant? I think you will find that yes the RDs can be constant while R00 varies, limited only by the fact that the RDs must sum to equal or less than 1-R00; if as typical R00 is not very large then in RD terms this is a vast range for additive effects to live in. In other words, your original claim is wrong…

You can research it if you want, but I seem to recall this was known in the bioassay literature in the 1920s and is one of the reasons product terms in linear models came to be called “interactions” (much later, statisticians in their unbounded carelessness began calling all product terms “interactions” regardless of the link function for the linear predictor, even though a connection of those terms to actual biologic interactions is absent from all but a few models).

P.S. Added belatedly: For those with a historical bent, in a paper invited by an engineering toxicology journal (“Elementary models for biological interaction”, Journal of Hazardous Materials 1985;10:449-454) I attempted to provide a connection between bioassay and epidemiologic models for interaction.

1 Like

At the risk of adding even more to everyone’s Sander Greenland reading list, I submit the following:

It goes into a number of issues discussed in this thread, especially the distinction between “main” effects and “causal” effects, which depends on the context.

Blockquote
An effect is a coefficient of study exposure in a generalized linear model for the outcome of
interest … The parametric definition arose in the context of randomized experiments … Given randomization, the definition is not very misleading.

Read the entire open access paper for details. The ellipses skip over a lot of important fundamentals.

While this thread has a wealth of information on the details of a data analysis that deserve an independent thread, I wanted to clarify where the essential area of disagreement is.

From what I can interpret, the dispute is whether Y is viewed as inherently ordinal (ie. a patient reported outcome, a functional capacity assessment, etc) or a proxy for a parametric variable in a causal model (ie. going from a cluster of qualitative observations to the discovery of a biophysical cause – ie. Alzheimer’s disease, physiology of pain syndromes, etc.)

As an example, the dominant perspective in chronic pain (maybe fashion is the more honest word) is the “new” biopsychosocial model (which isn’t all that new) while “biomechanical” models are criticized (often for erroneous reasons).

The way I interpret the disagreement between @Sander and @f2harrell (using pain syndromes as an example):

  1. At the initial onset of an episode of pain and decreased function, efforts to discover a causal mechanism (reducing ordinal observations to physiological mechanisms) have the best chance of success, so absolute measures of effect (RD, RR) should be computed from logistic models (unless some other model is more appropriate). This is a question of basic science that might provide guidance on treatment.

  2. But at later stages in the evolution of pain syndromes, when the basic question of causation is much more difficult to answer, and questions of treatment efficacy still remain. As these outcome measures of functional improvement are inherently ordinal, the covariate adjusted distribution derived from the OR is the most appropriate. Here, relative measures of effect adjusted by individual risk factors address the essential research question of direct interest to clinicians, patients, and other stakeholders.

Is this a reasonable interpretation of the dispute above? I infer this from my familiarity with the published writings of both Sander and Frank. In all of the examples in Sander’s papers, there was a reasonable prior that the pathophysiology of the disease was connected to chemical exposure and could be reduced to some (parametric) biochemical model.

Frank’s writings have intersected with my field of rehabilitation, where he has been listed as one of the authors on papers that deal with low back pain. In addition, his most recent project has been the promotion of ordinal outcome measures, where his recommendations have substantial merit (in my opinion).

For readers who are jumping into the discussion at this stage, I want to be clear that this part of the discussion relates only to a claim I made in this thread that no mechanism has been proposed that would guarantee stability of the risk difference, and not to any argument we make in the preprint. I am less certain about the claims I am making in this post than anything we wrote in the preprint; if it turns out I’m making a mistake in this part of the discussion, it is highly unlikely to affect the argument in manuscript.

Maybe I am being thick, but after thinking about this for a while now, I still believe that at least some form of the argument I made in this thread is correct (though possibly phrased imprecisely).

Modern Epidemiology considers two interacting factors, which I will relabel as V and A. Suppose the outcome is Y. I am interested in whether the effect of A on Y differs between people who have V=0 and people who have V=1.

There is then a result which easily can be restated to tell you that if there are no “interacting response types”, it follows that Pr(Y^(a=1, v=1)) - Pr(Y^(a=0, v=1)) = Pr(Y^(a=1, v=0)) - Pr(Y^(a=0, v=0)). This is what Tyler refers to as absence of causal interaction on the additive scale.

I claim that what usually matters is not whether there is no causal interaction, but rather whether there is no effect modification, i.e. whether Pr(Y^(a=1) | V=1) - Pr(Y^(a=0) | V=1) = Pr(Y^(a=1) | V=0) - Pr(Y^(a=0) | V=0) . Absence of causal interaction does not imply absence of effect modification, and doctors will generally find themselves in a situation where it is impossible to intervene on V. Effect modification is therefore usually a much more useful concept for clinical decision making than causal interaction.

Even if doctors are assigning two interventions simultaneously, scale-stability relative to variation in baseline risk due to background causes is much more central to the decision-making problem than scale-stability between the two interventions relative to each other

You might be able to argue that deviation from effect measure additivity implies the existence of some past covariate U which V is a marker for, and which has “interaction response types” with Y. But I am not sure this solves any practical problems unless you can condition on U. I note that this is likely to be a very large set of covariates, my intuition is that it will functionally require adjustment for every cause of Y.

If you wanted to set up the argument from Modern Epidemiology in terms of effect modification, I think it would be preferable to instead of considering 16 joint response types, to consider 4 response types for A in V=0, and separately, 4 response types for A in V=1. But I think it would then be very hard to come up with a plausible biologically interpretable argument (based on these abstractions) that leads to stability of the risk difference for A between strata of V…

For a concrete example (for readers of this discussion who are less familiar with the distinction between effect modification and interaction than Sander), suppose A is alcohol and V is smoking. If there is no causal interaction, this means that in trial where you randomize both alcohol and smoking, the effect of alcohol will be equal between V=1 and V=0. However, when smoking is assigned randomly, it follows that if there is a difference between the two conditional baseline risks for the alcohol intervention, Pr(Y^(a=0) | V=0) and Pr(Y^(a=0) | V=1), then this is entirely due to the causal effect of smoking, not due to smoking being a predictor of risk. This observation is essential for why joint response types can get you the rest of the way towards an argument for stability when considering causal interactions.

(Edited to add: Maybe it is possible to interpret the model as implying no effect modification because any interaction with a past covariate U which predicts V will be reflected in the response types between V and A. But then it becomes really challenging to give a meaningful interpretation to what an “interaction response type” is, and in order to rule out their existence, you need to rule out interaction with everything in the past…)

Anders: While I’m all for exploring practical limits of simple causal models and their implications, right now it looks to me that you are also doing what you complained about Doi did to defend his claims: Adding unnecessary and even irrelevant diversions that only obscure your mistake in issuing too sweeping a denial of the existence of simple mechanistic causal population structures under which the causal RD can be stable (unmodified) despite variation in background risk (risk under absence of both factors).

For those not having a copy of Modern Epidemiology (Ch. 18 in 2nd ed, Ch. 5 in 3rd ed) where the results at issue are covered, they were taken from Greenland S, Poole C, “Invariants and noninvariants in the concept of interdependent effects”, Scand J Work Environ Health 1988;14:125–129 (as usual, available on request). These citations delineate an entire class of mechanisms which can produce constant, additive RDs across a large range of baseline risk variation. In these mechanisms, for every individual the x effect does not depend on the z level and the z effect does not depend on the x level. A consequence is that there would be no modification of the RD (no effect-measure modification) for either x across z or z across x, i.e., perfect additivity of separate-treatment RDs to get joint treatment effects.

Variation in the baseline risk across populations need not destroy this additivity. I understand that such mechanistic RD additivity seems “paradoxical” given the range restrictions, but that just shows how your intuition suffers the same sort of limits as does Doi’s and Harrell’s (mine would too but for the fact that I encountered these results 40 years ago). As with any causal model, whether such noninteractive, non-modifying structures are plausible or realistically transportable is context dependent and hence largely in the eye of the beholder, so is a separate topic that we won’t resolve here.

It’s fine to pursue the source of an incorrect initial intuition to see what can be learned from it, but randomization and prediction are irrelevant to the present case: The above cites and my points are about the true causal RDs computed directly from the full x*z potential-outcome vector of each population member under every exposure combination. Whether we are accurately estimating effects or predicting risks are vast topics that do not bear on the results.

3 Likes

I’ll get back to you in a week’s time after having thought through this thoroughly. For now, I will just point out that your read on my psychology is wrong. I try very hard to always acknowledge my mistakes when someone convincingly points them out. Moreover, I am confident that I have a track record to back up that claim.

I don’t know whether what’s happening is that I’m not smart enough to understand your argument, or that I’m not communicating precisely enough for you to understand what I’m saying. But in either case, it is not because I am making any attempts to obscure my mistakes or defend a lost position.

2 Likes

If we indicate different baseline risks by a, b, c… etc then the implication is that:
a(RR-1) = b(RR-1) = c(RR-1) …
Which means that, by your own admission, in essence there are an entire class of mechanisms suggesting non-constancy of the RR.

“By your own admission”? Where in this thread has anyone claimed the RR is constant or nearly so outside of special models? You and coauthors are the ones who made erroneous assertions about constancy of the OR based on a confusion of statistical models with mechanistic causal models, combined with a misanalysis of meta-analytic data; those mistakes are what kicked off this thread and the exchanges in JCE. What is your point now? Have you finally learned that the OR is not constant under simple mechanistic models?

My position all along has been that in epidemiologic practice any assumption of constancy is just a statistical convenience which rarely has any credible basis in data or biology (I can’t speak for other fields, like bioassay which has a century-long literature on the topic). It’s been known for generations and obvious from the math that (with rare exceptions) constancy of one measure means nonconstancy of the rest - so if the RD is constant, we get nonconstancy of the OR, RR, relative difference, AF, and so on. And it’s obvious from correct meta-analyses that it’s rare indeed that one can correctly claim even one of them is constant (as opposed to mistakenly claiming constancy based on failure to “reject” using nearly powerless statistical tests).

When (as usual) the direction of heterogeneity is unclear, the best we can hope from assuming constancy is that it removes enough noise to more than compensate for the bias it creates - the old “bias vs. variance” tradeoff in minimizing mean-squared error. But that hope is not guaranteed and depends entirely on the target of estimation. Better still is to allow some heterogeneity controlled by penalties (priors) that can average the extremes between unconstrained heterogeneity and complete homogeneity, as allowed by hierarchal modeling of product terms. That became computationally feasible 40 years ago and by the 1990s could be found in textbooks, e.g. p. 431 of Modern Epidemiology 2nd ed. 1998 (p. 438 of the 3rd. ed.).

6 Likes

A few days ago, I wrote this in this thread:

Sander then correctly pointed out an example from Modern Epidemiology page 76, table 5.2, in which they show that if there are no “interaction response types" (which roughly means that for every individual, at least one of the interacting factors A and V must have no effect, for both values of the other interacting factor) then there will be stability of the causal risk difference for A, between the setting where we intervene to set V to 0, and the setting where we intervene to set V to 1. If we additionally assume exchangeability (independence between V and Y(a,v)), the causal risk difference for A will be equal between groups defined conditional on the observed value of V.

I do not contest this, and I am sorry about the imprecision in my claim. I do however maintain that this result is not very relevant to the kinds of decision problems that I am considering, in which a clinical decision maker needs to individualize the causal effect of the intervention (A) to a patient with V=v

Just so everyone is on the same page, let us differentiate between three subtly different phenomena:

  • Statistical interaction/Product term in observational model: Both interacting factors are observational
  • Effect modification: Primary intervention variable is counterfactual, the other factor is observational
  • Causal interaction: Both intervention variables are counterfactual

The reason I consider effect modification as more relevant to the decision problem than causal interaction, is that in order to individualize treatment to a person with V=v, doctors cannot intervene on V, and therefore need to find some aspect of reality that is invariant over observed V.

I am fairly sure you will agree that if there exists a common cause of V and Y, which we will call U, then even if there are no interaction response types between V and A, there is no guarantee that there will be no additive effective modification. To see why, you can stratify table 5.2 by U, and work out Pr(Y^(a) |​ V=v) for all a and v (using law of total probability over U). I am not going to type up the maths, but I would be very surprised if this claim is incorrect.

I note that if we are considering causal interaction, then any differences in baseline risk are due to the causal effect of V. This model therefore cannot give you a reason to expect additive stability between groups whose baseline risks differ because of common causes of V and Y (at least not without expanding the model to also claim no interaction types between A and all predictors of V ).

So in summary, What I should have written is something like this:

“Would you be able to formalize the biological story for a data generating process that leads to absence of effect modificaiton, i.e. stability of the causal risk difference across groups constructed based on observational variables, and whose baseline risk differs arbitrarily?.

3 Likes

Thanks Anders for the concession and clarifications. I certainly agree that there is no guarantee of observing additivity if there exists a common cause U of V and Y, because we now have confounding of the V effect on Y by U. To be more precise, we should not expect additivity of the observed RD(AY|V) and RD(VY|A) even if the causal AY and VY RDs do add, because in your set-up the observed RD(VY|A) is confounded by U.

It appears that you want to use V to guide decisions about treatment A without concern about confounding of V effects; that’s fine. But as far as I can see, all you are saying is that you are concerned with a case in which we think RD(AY|V) is unbiased enough for the AY effect to provide to clinicians, but we may have failed to control enough confounding of RD(VY|A) to make valid inferences about the V effect on Y. In that case it should be no surprise that we cannot make valid inferences about the interaction of A and V effects.

In sum, the one point I see coming out of your arguments is that if you want to study the causal interactions of A and V on Y, you have to control confounding of both A and V. More precisely and generally: To study causal interactions among components Xj of a vector of exposures X = (X1,…,XJ), you have to control confounding of X, e.g., by blocking all back-door paths from X to Y. A corollary is that to ease correct deductions about confounding control for studying causal interactions from a DAG, we ought to examine the graph that replaces the potentially interacting exposures with a single vector of them.

Given that your concern translates into a higher-order confounding problem, my answer to your query would be Yes: I can easily formalize a biological story for the data generating process that leads to absence of RD modification (i.e. stability of the causal risk difference across groups constructed from observational variables, with baseline risks differing arbitrarily within logical constraints): All I need for that is (1) no AV-interaction response types and (2) sufficient control of confounding of effects of the exposure vector (A,V) on Y; then there will be no modification of the observed RD(AY|V) across V within confounder levels and also after marginal adjustment (or after averaging using any shared weighting scheme). Note that (2) is no more stringent a requirement than that for mediation analysis, in which we replace the baseline variable V with an mediator M between A and Y.

Do you agree (in which case this subthread should be done) or can you exhibit a mistake in my reasoning?

2 Likes

This discussion on causal effect modification seems interesting so lets take an example. If we run some GLMs on Sanders example in the recent non-collapsibility paper in JCE and accept that there is neither confounding nor a sample artifact, the product term (EXP) in the log-risk model is 0.5 (sub-multiplicative) and in the logistic model is 1.6 (super-multiplicative). Thus male gender (M) modifies treatment (Rx) related death risk (decreased by 50% in M compared to females (F)) while simultaneously M also modifies treatment related death odds (increased by 55% in M compared to F). However in both M and F, Pr(death) increases with Rx (M from 0.6 to 0.9 and F from 0.1 to 0.3). In summary the association measures are (compared to baseline of noRxF):

Rx RR=3 OR=3.9
M RR=6 OR=13.5
RxM RR=9 ↓ OR=81 ↑

Obviously the results are conflicting because these measures are measuring association differently and our point has been that no heterogeneity of treatment effects that are meaningful should be derived from measures of association that measure that association poorly.
The inequality of the association of treatment with death in strata of gender (or the non-null product term) can be called association modification. This, of course, happens commonly even if gender is not a cause of death or not associated with a cause of death e.g. because of artifacts of the sample. Because association modification is mathematically reciprocal – If gender modifies the Rx-death association then Rx modifies the gender-death association so it suffices to just call this association modification or a statistical product term and avoid use of the term interaction.
The unfortunate reality is that even when Rx and gender are both causes of death and even when confounders of their marginal association with death are absent and even when there is no artifact of the sample, there is still no guarantee that the association modification between Rx and gender with respect to death corresponds to causal effect modification if the measure of association is poor (i.e. measuring association poorly due to interference by baseline risk).
The last point above is what we are discussing in this thread. We have three common choices in medical decision making around association measures for binary outcomes – RD, RR or OR. If any of these measure the intended association poorly, then the association modification will correspond poorly to effect modification aka heterogeneity of treatment effects even if confounding or artifacts are absent. This is why we see different results from different measures. In the example above, it is quite clear that the RR is spurious for the RxM comparison to baseline (noRxF) as examination of the data clearly demonstrates that belonging to the RxM group gives the greatest increase in mortality. This sort of non-monotone relationship of measures with true association (many ways this can be measured) seems to be a hallmark of collapsible measures of association.

Anders, I see you posted this:

Your site finally let me post the following comment which seems relevant here as well:
Tell us Anders, where did you find out about Abbott’s paper and why did you obtain and read it? After all, here at the Datamethods blog on July 4 you repeatedly dismissed my comments that mechanistic models and risk results go back to the 1920s, replying
“It therefore isn’t at all obvious to me that reading the old literature is generally the best use of time” and “I wouldn’t try to learn calculus from Newton or Leibniz’ original writings, and likewise, I wouldn’t try to learn causal inference from Robins (1986)”,
to which I replied at length, ending with:
“For those with a historical bent, in a paper invited by an engineering toxicology journal (“Elementary models for biological interaction”, Journal of Hazardous Materials 1985;10:449-454) I attempted to provide a connection between bioassay and epidemiologic models for interaction.” Among other things this 1985 paper cites W.S. Abbott, A method of computing the effectiveness of an insecticide, J. Econ. Entomol., 18 (1925) 265–267.

Also, you never responded to my deduction of risk additivity from the assumptions of no interaction response types and of no confounding of either single or joint factor effects.