Should one derive risk difference from the odds ratio?

I don’t object to any of the above, but I still have 2 questions:

  1. What is the problem with computing collapsible effect measures from logistic regression? Do you disagree with Frank that this model is useful in a wide variety of situations? Sander seems to agree (with certain exceptions – ie. log-risk with sparse data). Can’t causal reasoning be incorporated into an analysis that uses logistic regression?

  2. What data analysis plan do you propose instead of logistic regression, if you do not agree with this model as a default?

While I have no reason to doubt Frank’s intuition, I can see why it might be objectionable.

Sander stated above that other models should be considered:

How should an analyst do this in a principled way? If we are trying to do the best we can with a particular data set, it seems you need to collect data and fit models before making the choice; and if I’ve learned anything here, it is that this analysis uncertainty needs to be accounted for.

In an effort to synthesize the recommendations of Frank and Sander, I conclude model averaging and/or selection is the most principled way to do this.

But I may be missing something.

1 Like

A regression model is best understood as a set of homogeneity assumptions on a parametric scale. The model is correct if the homogeneity assumptions are correct, and incorrect if they are not. I do not believe the homogeneity assumptions of a logistic model. I am not interested in what parameters you “compute”, but in what parametric scale you use to define the model.

Certainly, some people utilize logistic models in causal inference, for example for predicting IPTW weights. I think the specific choice of model is mostly out of convenience, not because they have demonstrated that it is the best approach. The causal inference community is not very interested in parametric models (even if they use them in practice). Many of the thought leaders are moving to a semiparametric/machine learning mindset. I do not speak for the causal inference community, please do not interpret me as making this argument on behalf of other causal inference researchers. There is certainly no consensus against logistic regression among them.

My preference is for using the switch relative risk in place of the odds ratio, at least for the primary intervention of interest. This preference also applies to regression models. Weinberg (1986) (Applicability of the simple independent action model to epidemiologic studies involving two factors and a dichotomous outcome - PubMed) suggested implementing this in the GLM framework by using a log link when considering a main exposure that reduce incidence, and a complementary log link when considering a main exposure that increase incidence.

If you are uneasy about specifying the model after you know whether exposure increases or decreases risk, we will soon release a preprint that introduces a new class of regression models that handles this natively.

I think we can move on to part 2 based on the discussions thus far. We all have agreed with Sander that it makes sense to use a model form linear in the natural parameter of the data distribution (i.e. logistic for risks) and we also agree that it obeys the logical range restrictions of the outcome, makes the usual large-sample approximations (e.g., Wald P-values and CIs) most accurate in finite samples, and therefore makes the ORs the easiest summary measures to calculate. The logistic model also results in less misspecifications as Frank has repeatedly said, produces more meaningful predictions, has no boundary problems and is variation independent as we all agree on.

The main argument that remains is that it does not preserve collapsibility – i.e. the ‘logical problem’ of the marginal odds ratio not lying in the convex hull of stratum-specific odds ratios – apparently that ‘cannot be right under any circumstances’ and therefore, despite all of the above justifications, the OR cannot be a parameter of interest for causal inference or inference in general. Also there seems little agreement on derivation of collapsible effect measures from logistic modeling.

Preservation of collapsibility is posited to give the following benefits:

a) Align with some imaginary data generating mechanisms

b) Ease interpretation of effects in real life

c) Enable causal inference to proceed for which collapsibility is mandatory

d) Collapsible effect measures need new models to overcome their inherent modeling problems because they cannot be derived from odds ratio modeling

I now move the figures I posted above into a Table below:

The data (in the previous figures and the table above) are from Table 1 in Greenland Pearl and Robins (Statistical Science). From the diagnostic test perspective we now understand what the issue is - the RRs that have been considered collapsible are not really the associational RRs of interest - they are the likelihood ratios derived from our application of Bayes’ theorem (columns 2 & 3 above). Only after we derive posterior probabilities using these likelihood ratios (assuming baseline probability = 0.5) can we define the ‘true’ risk difference or risk ratio of posterior probabilities - and as expected they are also ‘noncollapsible’. Note I am using the outcome as a test of the treatment status for convenience only because the LRs then align with the RR. The same would also apply vice versa given that ORs are symmetrical.

In summary it seems that collapsibility only exists because the LRs are collapsible (as expected) but these LRs were (unfortunately) considered to be effect measures of primary interest

1 Like

I just found an interesting arXiv pre-print that goes into a detailed discussion of the issues of causal inference and effect measures discussed at length in this thread, with elaboration on the switch risk ratio discussed by @AndersHuitfeldt

Pay particular attention to section 4.2.5 Why Not a Logistic Model?

… it seems that such logistic models are rather forcing covariates and treatment to act through the link function, preventing any simple interpretation of the causal measure (except for the conditional OR).

Considering that collapsible measures can be computed from logistic models, I don’t consider the complaints in that section sufficiently persuasive to undermine the intuition that a logistic model is worth considering, even if one wishes to use tools from causal inference.

I’ve come to the conclusion that a class of models should be specified and chosen based on model selection or averaging techniques (ie. letting the data choose the model).

I think it only emphasizes what Frank has said all along: collapsible metrics can be derived from conditional ORs computed from logistic models. But I also think the paper gives a persuasive presentation on why collapsible effect measures (like the switch risk ratio) are worth serious consideration.

There was another paper I read awhile ago discussing the computation of the switch risk ratio from the logistic model, but do not recall the citation at the moment. I’ll post it when I find it.

Related Reading

Greenland, S. (2021). Noncollapsibility, confounding, and sparse-data bias. Part 1: The oddities of odds. Journal of clinical epidemiology, 138, 178-181 https://www.jclinepi.com/article/S0895-4356(21)00185-2/fulltext

Greenland, S. (2021). Noncollapsibility, confounding, and sparse-data bias. Part 2: What should researchers make of persistent controversies about the odds ratio?. Journal of clinical epidemiology, 139, 264-268. https://www.jclinepi.com/article/S0895-4356(21)00182-7/fulltext

Key quote from Sander’s Part 2:

Thus, my preferred solution is to stay with the logistic model but then use the fitted risks from the model to construct collapsible measures, such as covariate-specific and weighted-average (standardized) risk differences and risk ratios.

Greenland, S. (2004). Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. American journal of epidemiology, 160(4), 301-305.

Greenland, S. (2004). Interval estimation by simulation as an alternative to and extension of confidence intervals. International journal of epidemiology, 33(6), 1389-1397.

I haven’t read this but I have had the feeling that causal inference got off on the wrong foot by using solely the risk difference scale. I’m not convinced that is necessary.

Following up on citations to that arXiv paper lead me to a valuable series of relatively recent papers that discusses this effect measure issue from a public health and program evaluation perspective that will clarify why collapsible effect measures are important in this application.

From the author reply, they make the following claim:

  1. The odds ratio is not a parameter of interest in epidemiology and public health research. 5 Instead, the relative risk and risk difference are two often-used effect measures that are of interest. Both of them are collapsible. Directly modeling these two measures6,7 eliminates the noncollapsibility matter entirely.

They cite the following paper by Sander:

Greenland, S. (1987). Interpretation and choice of effect measures in epidemiologic analyses. American journal of epidemiology, 125(5), 761-768. (PDF)

I will argue only incidence differences and ratios possess direct interpretations as measures of impact on average risk or hazard … logistic and log-linear models are useful only insofar as they provide improved (smoothed) incidence differences or ratios

Would you mind if I started a new thread, similar to the wiki-style Myths thread, that summarizes the areas of agreement in this one, along with references to the relevant literature? I think it is time to put the relevant issues – higher level clinical questions and the statistical modelling and reporting, in a decision theoretic framework.

After following up on a number of citations, I get the feeling that researchers are still hampered by traditions that were somewhat reasonable given the space and computational constraints of the past, but are not aware that we can do much better with modern computing power.

Relevant Reading:

Greenland, S., & Pearce, N. (2015). Statistical foundations for model-based adjustments. Annual review of public health, 36, 89-108. (PDF)

Here is a recently updated version summary (March 2023) of the @AndersHuitfeldt paper on the switch risk ratio.

For a statistical justification of the switch (causal) relative risk, the following is worth study:

Van Der Laan, M. J., Hubbard, A., & Jewell, N. P. (2007). Estimation of treatment effects in randomized trials with non‐compliance and a dichotomous outcome. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3), 463-482. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9868.2007.00598.x

We will show that the switch causal relative risk can be directly modelled in terms of (e.g.) a logistic model, so that the concerns expressed above for modelling the causal relative risk [ needing to correctly specify nuisance parameters] do not apply to this parameter.

1 Like

That is not my main paper on the switch relative risk, just a short summary that was published recently in Epidemiology, in order to bring attention to the history of this idea, and in particular to the contributions of Mindel C. Sheps, who died 50 years ago this year.

My original paper on the switch relative risk (which did not use that term, but contained most of the relevant ideas) was too long for epidemiology journals and was therefore split up into three separate papers and published as The Choice of Effect Measure for Binary Outcomes: Introducing Counterfactual Outcome State Transition Parameters (which contains the methodological considerations for choosing between effect measures), Effect heterogeneity and variable selection for standardizing causal effects to a target population | SpringerLink (which contains my argument for why effect measures are still necessary; this is something that some people in the causal inference community need to be convinced about, but most people in this thread are probably on the same page as me on that particular question) and On the collapsibility of measures of effect in the counterfactual causal framework | Emerging Themes in Epidemiology | Full Text (which introduces some collapsibility weights that were used in the other papers)

After those papers were published, I became more familiar with some of the relevant older literature, and started using more standard terminology. I subsequently wrote [2106.06316] Shall we count the living or the dead? , which probably contains my most complete and most convincing attempt to make this argument. This manuscript does not really introduce much new theory that wasn’t already contained in the three papers above (other than a demonstration that the same argument can be made with causal pie models, and a new impossibility proof for why it is impossible for certain types of data generating mechanisms to result in stability of a non-collapsible effect measure) but if someone wants to understand my point of view, it is probably where they should start.

1 Like

Anders: I think if we look back at Doi’s last post you and I may agree that it doesn’t correctly grasp the targets in the potential-outcome (counterfactual) causal models underlying our recent discussions and those in older papers such as Greenland-Robins-Pearl Stat Sci 1999. To tie the switch-RR (SRR) to the type of two-stratum OR-noncollapsibility examples in which the OR and RD are both constant, I would ask if you would display the SRR for an example in which there are two strata of 100 patients each, and in stratum 1, 90 die if treated but 80 die if untreated, while in stratum 2, 20 die if treated but 10 die if untreated. If I have not made an error, the causal RD is 0.1 in both strata as well as marginally (the causal RD is strictly collapsible), while the causal OR is 2.25 in both strata but the marginal causal OR is about 1.5 (the causal OR is not collapsible). The causal RR is collapsible but not constant, a description which could be said to depend on the causally relevant weighting for the RR. It may be worth noting that these results depend only on the stratum-specific marginal distributions of the potential outcomes, thus avoiding the objections to positing a joint distribution for them, and so they carry over when using instead certain decision-theoretic causal models.

My question for you is how you would describe the behavior and weighting of the SRR in this example and more generally, given the causal model.

1 Like

I assume you mean two strata of patients with 100 in each of the treated and untreated groups……
In that case we have the following results:

As can clearly be seen the LR is being mistaken to be the RR and this to me seems to be the basis of all the controversy around noncollapsibility. Obviously, in stratum 1, if we apply a baseline probability of 0.5 (keep in mind that a baseline probability is a conditional probability) then the RR derived predicted probability is 1. This does not make sense to me and does not align with Bayes’ theorem. The mistake all along has been not recognizing that the RR is actually a LR and treating it as an effect measure. Of course, I am happy to be corrected based on any detected mathematical issues but not if based on imaginary mechanisms of generation of data.

NB: I renamed ‘baseline probability’ to ‘unconditional prevalence’ in the tables to avoid misunderstanding. In the top panel the prevalence is of treatment ‘X’ while in the bottom is of outcome ‘Y’

Great idea. But beware - I have the suspicion that some of those papers make the mistake of implying you should directly model RRs and ARRs instead of getting them from a proper probability model.

I have been trying with some difficulty to follow your discussion @sander, @s_doi and @AndersHuitfeldt about collapsibility of the proportions in @s_doi’s tables.

It is my understanding that RR is collapsible when r is constant under the following circumstances: p(Y=1∩X=0|Z=1∩X=0)/p(Y=1∩X=1|Z=1∩X=1) = r, p(Y=1∩X=0|Z=0∩X=0)/p(Y=1∩X=1|Z=0∩X=1) = r and the marginal p(Y=1∩X=0|X=0)/p(Y=1∩X=1|X=1) = r. This situation will arise in the special case when p(Z=1∩X=1|Y=1∩X=1)=p(Z=1∩X=0|Y=1∩X=0) and also if p(Z=1∩X=1|X=1)= p(Z=1∩X=0|X=0). It follows that the ratios of ‘survival from Y=0’ (i.e. Y=1) will also be collapsible. These special cases will be met rarely in a precise manner for observed proportions that are subject to stochastic variation and can only be postulated for ‘true’ parametric values.

It is also my understanding is that the OR will be collapsible when r’ is constant under the following circumstances odds(Y=1∩X=0|Z=1∩X=0)/odds(Y=1∩X=1|Z=1∩X=1) = r’, odds(Y=1∩X=0|Z=0∩X=0)/odds(Y=1∩X=1|Z=0∩X=1) = r’ and the marginal odds(Y=1∩X=0|X=0)/odds(Y=1∩X=1|X=1) = r’. It follows that the odds ratios for Y=0 will also be collapsible. This collapsibility of odds will arise in the special case when p(Z=1∩X=1|Y=1∩X=1)= p(Z=1∩X=0|Y=1∩X=0) and p(Z=1∩X=1|Y=0∩X=1)= p(Z=1∩X=0|Y=0∩X=0).

In the special conditions for OR collapsibility, p(Z=1|X=1)≠p(Z=1|X=0) so that when the OR is collapsible, it does not model exchangeability at baseline between {X=0} the set of control subjects and {X=1} the set of treated subjects. Therefore p(Z=1|X=1)≠p(Z=1|X=0) implies that p(Z=0|X=0) is a pre-treatment baseline probability of the covariate Z=1 and p(Z=1|X=1) is a post treatment probability. However, the RR and SRR models do model exchangeability between the sets {X=0} and {X=1}. Again these special cases will be met rarely for observed proportions subject to stochastic variation and can only be postulated for ‘true’ parametric values.

In my limited experience of analysing data for diagnostic and prognostic tests, the observed proportions remain different to the above special cases but often not too different so that it possible to assume that the data is compatible with parameters that satisfy either of the conditions for collapsibility of OR and RR.

So in conclusion I can follow your reasoning that the RD is collapsible (0.2-0.1 = 0.1, 0.9-0.8- =0.1 and marginal 0.55-0.45 = 0.1. I follow that the OR are not collapsible (2.25, 2.25 and marginal 1.5) and especially as the data do not satisfy the special case of paragraph 3 above. Regarding RRs, I find them to be 90/80= 1.125, 20/10=2 and marginal 110/90 = 1.22. In this context, what do you mean @sander by “the causal RR is collapsible but not constant”?

1 Like

Hi Huw, I think its best you look at this from the diagnostic test perspective. As Sander said and you rightly pointed out the RD in each stratum is 0.1 and the table of Y probabilities is as follows (I use X for treatment, Y for outcome and Z for the stratum variable):

However the RD is TPR - FPR and the RR is TPR/FPR in diagnostic test language and therefore RR is also the likelihood ratio. The table above therefore gives us 6 likelihood ratios (three pLR and three nLR values) and we can then compute conditional probabilities for any baseline unconditional prevalence of the outcome (or the treatment depending on how the test assignment is made). The ratios or the differences of conditional probabilities are also noncollapsible meaning that the marginal is to one side (lower) of the stratum specific effects. This only happens when the stratum variable is prognostic for the outcome and one would expect this with diagnostic tests. Only likelihood ratios are collapsible as they should be but they are not effect measures.
The above then explains the tables I posted previously. The concept of the RR as used today is therefore (as I have said previously) not consistent with Bayes’ theorem. From the diagnostic test perspective the Z strata (0, 1) and total (marginal) represent two different spectra of ‘disease’ and therefore have likelihood ratios that indicate different test performance.

Huw: A general definition of collapsibility is in col. 1 p. 38 of Greenland Robins Pearl Stat Sci 1999: “Now suppose that a measure is not constant across the strata, but that a particular summary of the conditional measures does equal the marginal measure. This summary is then said to be collapsible across Z.” Today I’d amend that slightly to “is not necessarily constant”. The particular summary here is the RR standardized to the total cohort.

1 Like

Thank you @Sander and @s_doi for your answers. I am still finding it difficult to relate the terms that you use during your explanations to my own terms and prior understanding as summarised in my previous post. For example, I think of the problem of looking at two sets – one representing control and one treatment created by randomisation and which are initially exchangeable. They change subsequently depending on the effects of treatment and control. The subsequent difference between them can be expressed in a number of ways – through RD, RR, SRR, OR or some other way. It would also help me to understand the concept of collapsibility is put into practical use. Perhaps it might be easier for you to help me if you refer to a new topic that I started a few days ago, which describes my way of thinking: Risk based treatment and the validity of scales of effect

2 Likes

Hi Huw, I think its best we continue with Sander’s example as the data is accessible to all. Also I will try to make it very clear in simple non-mathematical language so that we can benefit from your insights as an unbiased observer

First, lets assume we have three different groups based on Sander’s example:

Stratum 1 (Z=0, group 1)

Stratum 2 (Z=1, group 2)

Marginal (Mixed Z, group 3)

Since Z is prognostic for Y, these three groups represent different diagnostic test scenarios where in groups 1 and 2 the test discrimination is not influenced by Z but it is influenced by Z in group 3 (since Z is ignored).

Now consider that you would like to predict, for an individual in each group, what their probability is of having been treated based on the outcome Y=0 (or Y=1) then the best way to do this is to compute a likelihood ratio for each group – this is what I did so if we take the three groups for example:

0.2/0.1 = 2 = pLR in group 1 and 0.8/0.9 = 0.89 = nLR in group 1

0.9/0.8 = 1.13 = pLR in group 2 and 0.1/0.2 = 0.5 = nLR in group 2

0.55/0.45 = 1.22 = pLR in group 3 and 0.45/0.55 = 0.82 = nLR in group 3

Note that all the pLRs are also the RRs and now the difference emerges between what I am arguing and what causal inference groups are arguing:

My point: The RR is actually pLR and therefore TPR/FPR should be thought of as (posterior odds/prevalence odds) and should only be interpreted as such and under this constraint the ratio of posterior (conditional) risks is noncollapsible just as the odds ratio is noncollapsible

Causal inference community: It is okay to consider the TPR as treatment conditional risk and the FPR as untreated conditional risk (therefore violate Bayes’ theorem) because the benefit of so doing is collapsibility which means that the RR is now logic respecting and can be used in causal inference

I cannot see how this can work given that groups 1 & 2 above only differ on threshold not on discrimination and can therefore have different LRs but not different measures of effect. When we mix-up X with Z (marginal) we actually now have a different test scenario from groups 1 & 2 and therefore we expect here that not only the likelihood ratios but also the effect measure should be different as well and different in a consistent direction (weaker) – aka noncollapsibility albeit there is nothing bad about this anymore. This is also what Kuha and Mills tried to say in their paper which resonates with me. I look forward to your thoughts on this and to be corrected if there is any flaw of logic.

1 Like

Hi Suhail Thank you. I think that I understand. Instead of trying to predict an outcome such as death (1) conditional on treatment with S1 (e.g. with some symptom present) and control with S1 (i.e. also with the same symptom present) and (2) conditional on treatment with S2 (e.g. with the symptom absent) and control with S2 (i.e. also with the same symptom absent) and (3) conditional on treatment and control only (not knowing about S)’ you are trying to do something else. You are inverting the whole process by wishing to predict whether treatment was given conditional on the outcome (e.g. death as in a post mortem examination). In this situation, I agree with you that the risk ratio becomes the likelihood ratio since the outcome is now is whether treatment or control was given and the previous outcome (e.g. death or survival) becomes part of the evidence.

My next question is: Why should you and perhaps @sander wish to do this by predicting what has already passed? Is the ultimate purpose to try to estimate what would have happened in a counterfactual situation? Or is the ultimate purpose to try to predict the result of a RCT without having to actively randomise but instead conducting a ‘natural experiment’? I would also like to understand why ‘collapsibility’ is important in this.

3 Likes

Those are the perfect questions to raise. Backwards-information-flow conditioning is seldom useful and is dependent on more context than is forwards-time-conditioning. And I’m struggling to understand why collapsibility is interesting in any context.

1 Like

Hi Huw, I am not actually doing this inverted process by choice - I am presenting how the RR can be visualized in diagnostic terms. I too was surprised initially but this is where we stand and the RR is the LR of this inverted world-view. In other words what I am trying to say is that the RR is the LR for the ‘outcome’ as a test of the ‘treatment’ - all RRs are such LRs. In terms of the diagnostic test scenario noncollapsibility means that a prognostic covariate that has been ignored does not allow optimal assessment of ‘test’ (which is actually the outcome in a trial for example) performance until it is adjusted for - which I believe is what Frank has been saying all along.

1 Like

Frank, I agree with you, I too have reached that conclusion but some rebuttal needs to be made to causal inference groups who say its not logic respecting etc for it to be present with ORs? I am gradually moving towards the position that RR is not logic respecting but for these LR reasons. The diagnostic test scenario is one of discrimination but this is equal to association in a trial since there is no difference between the diagnostic odds ratio and the classical odds ratio.

1 Like

Hi Suhail. Would you please describe what you mean by “logic respecting”, “association in a trial”, “diagnostic odds ratio” and “classical odds ratio” perhaps using the numerical examples in your recent post.

1 Like