Should one derive risk difference from the odds ratio?

I assume you mean two strata of patients with 100 in each of the treated and untreated groups……
In that case we have the following results:

As can clearly be seen the LR is being mistaken to be the RR and this to me seems to be the basis of all the controversy around noncollapsibility. Obviously, in stratum 1, if we apply a baseline probability of 0.5 (keep in mind that a baseline probability is a conditional probability) then the RR derived predicted probability is 1. This does not make sense to me and does not align with Bayes’ theorem. The mistake all along has been not recognizing that the RR is actually a LR and treating it as an effect measure. Of course, I am happy to be corrected based on any detected mathematical issues but not if based on imaginary mechanisms of generation of data.

NB: I renamed ‘baseline probability’ to ‘unconditional prevalence’ in the tables to avoid misunderstanding. In the top panel the prevalence is of treatment ‘X’ while in the bottom is of outcome ‘Y’

Great idea. But beware - I have the suspicion that some of those papers make the mistake of implying you should directly model RRs and ARRs instead of getting them from a proper probability model.

I have been trying with some difficulty to follow your discussion @sander, @s_doi and @AndersHuitfeldt about collapsibility of the proportions in @s_doi’s tables.

It is my understanding that RR is collapsible when r is constant under the following circumstances: p(Y=1∩X=0|Z=1∩X=0)/p(Y=1∩X=1|Z=1∩X=1) = r, p(Y=1∩X=0|Z=0∩X=0)/p(Y=1∩X=1|Z=0∩X=1) = r and the marginal p(Y=1∩X=0|X=0)/p(Y=1∩X=1|X=1) = r. This situation will arise in the special case when p(Z=1∩X=1|Y=1∩X=1)=p(Z=1∩X=0|Y=1∩X=0) and also if p(Z=1∩X=1|X=1)= p(Z=1∩X=0|X=0). It follows that the ratios of ‘survival from Y=0’ (i.e. Y=1) will also be collapsible. These special cases will be met rarely in a precise manner for observed proportions that are subject to stochastic variation and can only be postulated for ‘true’ parametric values.

It is also my understanding is that the OR will be collapsible when r’ is constant under the following circumstances odds(Y=1∩X=0|Z=1∩X=0)/odds(Y=1∩X=1|Z=1∩X=1) = r’, odds(Y=1∩X=0|Z=0∩X=0)/odds(Y=1∩X=1|Z=0∩X=1) = r’ and the marginal odds(Y=1∩X=0|X=0)/odds(Y=1∩X=1|X=1) = r’. It follows that the odds ratios for Y=0 will also be collapsible. This collapsibility of odds will arise in the special case when p(Z=1∩X=1|Y=1∩X=1)= p(Z=1∩X=0|Y=1∩X=0) and p(Z=1∩X=1|Y=0∩X=1)= p(Z=1∩X=0|Y=0∩X=0).

In the special conditions for OR collapsibility, p(Z=1|X=1)≠p(Z=1|X=0) so that when the OR is collapsible, it does not model exchangeability at baseline between {X=0} the set of control subjects and {X=1} the set of treated subjects. Therefore p(Z=1|X=1)≠p(Z=1|X=0) implies that p(Z=0|X=0) is a pre-treatment baseline probability of the covariate Z=1 and p(Z=1|X=1) is a post treatment probability. However, the RR and SRR models do model exchangeability between the sets {X=0} and {X=1}. Again these special cases will be met rarely for observed proportions subject to stochastic variation and can only be postulated for ‘true’ parametric values.

In my limited experience of analysing data for diagnostic and prognostic tests, the observed proportions remain different to the above special cases but often not too different so that it possible to assume that the data is compatible with parameters that satisfy either of the conditions for collapsibility of OR and RR.

So in conclusion I can follow your reasoning that the RD is collapsible (0.2-0.1 = 0.1, 0.9-0.8- =0.1 and marginal 0.55-0.45 = 0.1. I follow that the OR are not collapsible (2.25, 2.25 and marginal 1.5) and especially as the data do not satisfy the special case of paragraph 3 above. Regarding RRs, I find them to be 90/80= 1.125, 20/10=2 and marginal 110/90 = 1.22. In this context, what do you mean @sander by “the causal RR is collapsible but not constant”?

1 Like

Hi Huw, I think its best you look at this from the diagnostic test perspective. As Sander said and you rightly pointed out the RD in each stratum is 0.1 and the table of Y probabilities is as follows (I use X for treatment, Y for outcome and Z for the stratum variable):

However the RD is TPR - FPR and the RR is TPR/FPR in diagnostic test language and therefore RR is also the likelihood ratio. The table above therefore gives us 6 likelihood ratios (three pLR and three nLR values) and we can then compute conditional probabilities for any baseline unconditional prevalence of the outcome (or the treatment depending on how the test assignment is made). The ratios or the differences of conditional probabilities are also noncollapsible meaning that the marginal is to one side (lower) of the stratum specific effects. This only happens when the stratum variable is prognostic for the outcome and one would expect this with diagnostic tests. Only likelihood ratios are collapsible as they should be but they are not effect measures.
The above then explains the tables I posted previously. The concept of the RR as used today is therefore (as I have said previously) not consistent with Bayes’ theorem. From the diagnostic test perspective the Z strata (0, 1) and total (marginal) represent two different spectra of ‘disease’ and therefore have likelihood ratios that indicate different test performance.

Huw: A general definition of collapsibility is in col. 1 p. 38 of Greenland Robins Pearl Stat Sci 1999: “Now suppose that a measure is not constant across the strata, but that a particular summary of the conditional measures does equal the marginal measure. This summary is then said to be collapsible across Z.” Today I’d amend that slightly to “is not necessarily constant”. The particular summary here is the RR standardized to the total cohort.

1 Like

Thank you @Sander and @s_doi for your answers. I am still finding it difficult to relate the terms that you use during your explanations to my own terms and prior understanding as summarised in my previous post. For example, I think of the problem of looking at two sets – one representing control and one treatment created by randomisation and which are initially exchangeable. They change subsequently depending on the effects of treatment and control. The subsequent difference between them can be expressed in a number of ways – through RD, RR, SRR, OR or some other way. It would also help me to understand the concept of collapsibility is put into practical use. Perhaps it might be easier for you to help me if you refer to a new topic that I started a few days ago, which describes my way of thinking: Risk based treatment and the validity of scales of effect

2 Likes

Hi Huw, I think its best we continue with Sander’s example as the data is accessible to all. Also I will try to make it very clear in simple non-mathematical language so that we can benefit from your insights as an unbiased observer

First, lets assume we have three different groups based on Sander’s example:

Stratum 1 (Z=0, group 1)

Stratum 2 (Z=1, group 2)

Marginal (Mixed Z, group 3)

Since Z is prognostic for Y, these three groups represent different diagnostic test scenarios where in groups 1 and 2 the test discrimination is not influenced by Z but it is influenced by Z in group 3 (since Z is ignored).

Now consider that you would like to predict, for an individual in each group, what their probability is of having been treated based on the outcome Y=0 (or Y=1) then the best way to do this is to compute a likelihood ratio for each group – this is what I did so if we take the three groups for example:

0.2/0.1 = 2 = pLR in group 1 and 0.8/0.9 = 0.89 = nLR in group 1

0.9/0.8 = 1.13 = pLR in group 2 and 0.1/0.2 = 0.5 = nLR in group 2

0.55/0.45 = 1.22 = pLR in group 3 and 0.45/0.55 = 0.82 = nLR in group 3

Note that all the pLRs are also the RRs and now the difference emerges between what I am arguing and what causal inference groups are arguing:

My point: The RR is actually pLR and therefore TPR/FPR should be thought of as (posterior odds/prevalence odds) and should only be interpreted as such and under this constraint the ratio of posterior (conditional) risks is noncollapsible just as the odds ratio is noncollapsible

Causal inference community: It is okay to consider the TPR as treatment conditional risk and the FPR as untreated conditional risk (therefore violate Bayes’ theorem) because the benefit of so doing is collapsibility which means that the RR is now logic respecting and can be used in causal inference

I cannot see how this can work given that groups 1 & 2 above only differ on threshold not on discrimination and can therefore have different LRs but not different measures of effect. When we mix-up X with Z (marginal) we actually now have a different test scenario from groups 1 & 2 and therefore we expect here that not only the likelihood ratios but also the effect measure should be different as well and different in a consistent direction (weaker) – aka noncollapsibility albeit there is nothing bad about this anymore. This is also what Kuha and Mills tried to say in their paper which resonates with me. I look forward to your thoughts on this and to be corrected if there is any flaw of logic.

1 Like

Hi Suhail Thank you. I think that I understand. Instead of trying to predict an outcome such as death (1) conditional on treatment with S1 (e.g. with some symptom present) and control with S1 (i.e. also with the same symptom present) and (2) conditional on treatment with S2 (e.g. with the symptom absent) and control with S2 (i.e. also with the same symptom absent) and (3) conditional on treatment and control only (not knowing about S)’ you are trying to do something else. You are inverting the whole process by wishing to predict whether treatment was given conditional on the outcome (e.g. death as in a post mortem examination). In this situation, I agree with you that the risk ratio becomes the likelihood ratio since the outcome is now is whether treatment or control was given and the previous outcome (e.g. death or survival) becomes part of the evidence.

My next question is: Why should you and perhaps @sander wish to do this by predicting what has already passed? Is the ultimate purpose to try to estimate what would have happened in a counterfactual situation? Or is the ultimate purpose to try to predict the result of a RCT without having to actively randomise but instead conducting a ‘natural experiment’? I would also like to understand why ‘collapsibility’ is important in this.

3 Likes

Those are the perfect questions to raise. Backwards-information-flow conditioning is seldom useful and is dependent on more context than is forwards-time-conditioning. And I’m struggling to understand why collapsibility is interesting in any context.

1 Like

Hi Huw, I am not actually doing this inverted process by choice - I am presenting how the RR can be visualized in diagnostic terms. I too was surprised initially but this is where we stand and the RR is the LR of this inverted world-view. In other words what I am trying to say is that the RR is the LR for the ‘outcome’ as a test of the ‘treatment’ - all RRs are such LRs. In terms of the diagnostic test scenario noncollapsibility means that a prognostic covariate that has been ignored does not allow optimal assessment of ‘test’ (which is actually the outcome in a trial for example) performance until it is adjusted for - which I believe is what Frank has been saying all along.

1 Like

Frank, I agree with you, I too have reached that conclusion but some rebuttal needs to be made to causal inference groups who say its not logic respecting etc for it to be present with ORs? I am gradually moving towards the position that RR is not logic respecting but for these LR reasons. The diagnostic test scenario is one of discrimination but this is equal to association in a trial since there is no difference between the diagnostic odds ratio and the classical odds ratio.

1 Like

Hi Suhail. Would you please describe what you mean by “logic respecting”, “association in a trial”, “diagnostic odds ratio” and “classical odds ratio” perhaps using the numerical examples in your recent post.

1 Like

Hi again Sander. Would you please illustrate this explanation using the numerical examples in your discussion with Suhail @s_doi.

Hi Huw, ‘logic respecting’ means RR{z+,z−} ∈ [RRz−, RRz+] when for example there are three groups z+, z−, and all-comers {z+, z−} though I personally fail to see the ‘logic’.
The classical odds ratio is the one we are discussing and use in epidemiological studies while the diagnostic odds ratio is (tp×tn)/(fp×fn)

Huw: This has been illustrated in many articles over the past 40 years, including Greenland Robins Pearl Stat Sci 1999 and most recently here: Noncollapsibility, confounding, and sparse-data bias. Part 2: What should researchers make of persistent controversies about the odds ratio? - ScienceDirect
Please read them and their primary source citations.
See also Ch. 4 and 15 of Modern Epidemiology 2nd ed 1998 or 3rd ed 2008 for definitions of standardized measures.

2 Likes

Huw: I also strongly recommend that you (and Harrell and Doi) read thoroughly and in detail the book Causal Inference: What If by Hernan & Robins, the newest (Mar. 2023) edition is available as a free download here:

The text is searchable, so you can look up the book’s discussions of potential outcomes, counterfactuals, noncollapsibility, randomization, standardization etc.

3 Likes

Sander, this was a really good choice as a reference as the book is clearly written with all terminology explained at first use and the first part defines all key concepts in a very clear and easy to understand style and clarifies what the ‘accepted’ definitions and positions are (for the causal inference community).

In relation to the odds ratios (page 44) they however toe the same line and use personal opinion to say “We do not consider effect modification on the odds ratio scale because the odds ratio is rarely, if ever, the parameter of interest for causal inference.” There is no further information that can shed light on why this position is taken and therefore I assume its the noncollapsibility angle here too but they also do not tell us why its bad or why its ‘logic not-respecting’.

I personally think that perhaps ignoring the fact that the RR is a likelihood ratio has created this situation because LRs are collapsible but only if we ignore the pair (pLR and nLR). Had they been considered as a pair, then we are back to the OR. If we take your example data above, the outcome prevalence in the three groups is 15%, 50% and 85% and the difference between true positives and false positives (aka RD) is 0.1 because the prevalence is being ignored. If however we compute the expected r1 and r0 for the same prevalence using pLR for r1 and nLR for r0 then the RD in the same example is noncollapsible but this is being ignored. Why is it reasonable to ignore the variation dependence on prevalence?.

Subhail: We’re approaching 500 comments just in this thread, and there are scores and scores of papers and book chapters to be read on the topics. Noncollapsibility spans some 70 years if we go back to Simpson, 120 if we go back to Yule, while causal modeling with potential outcomes goes back a century, and their confluence is around 40 years old.

On this thread I’ve reviewed and cited some fundamentals. I regret that has not worked for you, and am afraid you will have to do more in-depth reading on your own if you actually want to understand what you are missing about causal models in general and noncollapsibility in particular (same for Frank and Huw).

The oddest thing for me is this: Basic causal models are simple tools, involving only high-school algebra - in my experience I found that entering students had much less trouble with them than with probability. Yet established professors who did not encounter causal models in their statistical training (i.e., most of those in my and Frank’s cohort) often seem to never quite get what the models are about or what they encode that ordinary probability (“statistical”) models miss.

It’s as if becoming immersed in probability models (which are full of brain twisters) can sometimes obstruct comprehension of causal models, while the probability-naive often find causal models far more intuitive than probability. I speculate the obstruction happens when a person becomes so habituated to thinking in terms of probability and randomized experiments that they can no longer think through a problem of aggregates without invoking probability or randomization, even when doing so only hinders solving the problem. This is one reason I advocate teaching causality as a foundation for probability and statistics (e.g., see Greenland S (2022). The causal foundations of applied probability and statistics. Ch. 31 in Probabilistic and Causal Inference: The Works of Judea Pearl. ACM Books, no. 36, 605-624, [2011.02677] The causal foundations of applied probability and statistics). In any case, it is not the only example of blindness induced by statistical training; significance testing provides a far more common and damaging example (e.g., see McShane, B. B., and Gal, D. (2016). Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence. Management Science, 62, 1707–1718).

Of course Pearl has remarked about the cognitive obstruction of causality by statistics, and rather undiplomatically at times; but harsh exposition should not blind us to the reality of the phenomenon. Our anecdotal observations are I think supported by some work in experimental psychology; @AndersHuitfeldt has noted as much, I believe.

When properly combined with probability, causal models place what are often very severe restrictions on longitudinal probability distributions. Those restrictions enforce basic intuitions about how causality operates, starting with time order. Once that machinery is mastered, the noncollapsibility issue can be framed in terms of loss functions for comparing action choices, e.g., noncollapsible measures do not adequately track losses proportional to average risks (which arise in all public-health problems I’ve looked at).

With the decision and loss framing in mind you might find it easier to approach the topics using the kind of decision-theoretic causal models used by Dawid and Didelez, which reproduce the randomization-identification results derived from PO models without the joint PO distribution that seem to trip up some statisticians. Some of the other causal models competing in recent decades are reviewed in detail in papers by Robins & Richardson cited in the Hernan & Robins book, notably their 2010 book chapter that focuses on graphical formulations, and their subsequent 2013 SWIG fusion of graphs and potential outcomes.

4 Likes

Hi again Sander. Thank you for your advice on reading material, which I gratefully accept and started reading. The diagnostic process is of course causal inference in the medical context and goes back thousands of years. Modern causal inference in medicine is largely based on concepts of positive and feedback mechanisms (that latter described as homeostasis and self repair) that can be represented by quantitative cyclical graphs as well as DAGs. The result of causal inference during the diagnostic process is a ‘diagnosis’. A diagnosis is therefore the title to a series of facts, assumptions and reasoning that form a hypothesis to be tested with a medical test or treatment in an individual and RCTs or other experiments on groups.

Randomisation can thus be regarded as an attempt to emulate inaccessible counterfactual situations (in the current absence of a time machine that allows someone to go back in time an ‘do’ a counterfactual act) and to test hypotheses created by causal inference. In order to diagnose and treat ethically, I have to listen carefully to patients, scientists, statisticians, those studying causal inferences in general and everyone else who can contribute to my aims for a patient. I cannot thank @f2harrell enough for setting up this site to encourage this multi-disciplinary process by allowing us to synchronise concepts from different disciplines through questions and answers or .

I have done searches on this very well written book ‘Causal Inference: What If’ by Hernán and Robins but could not find any reference to the special conditions that allow collapsibility to be present for OR, RR etc (some of which were described by me in an earlier post 478 [ Should one derive risk difference from the odds ratio? - #478 by HuwLlewelyn ]). Can you @Sander or other readers point me to the place in ‘Causal Inference: What If’ (or in another source about causal inference) where such special conditions for collapsibility are specified or discussed?

3 Likes

I have referred many times in this thread and others to this article of ours (section 2.1 and the appendix) that popularizes collapsibility for clinicians drawing on decades of work by @Sander and others. In there, we cite key papers (also previously linked to in this thread and forum) that answer your question such as this, this, and this.

1 Like