Should one derive risk difference from the odds ratio?

Huw, I think we are using terminology differently due to being trained in different academic traditions, and that is making it very hard for me to parse your arguments. I don’t want to insist on using my terminology, but what you wrote is genuinely hard for me to follow. I suspect (but do not claim to know for sure) that parts of your argument relies on a conflation between concepts that would be avoided when using different terminology with explicit counterfactual variables.

I think it is important to be clear about the distinction between confounding and collapsibility, and that is really only possible when collapsibility is defined in terms of counterfactuals, such as in our paper “On the collapsibility of measures of effect in the counterfactual causal framework” at On the collapsibility of measures of effect in the counterfactual causal framework | Emerging Themes in Epidemiology | Full Text

The way I define collapsibility, the risk ratio is always collapsible as a simple mathematical property of the effect measure.

If you have concise text supporting risk ratios from a biologic or physics standpoint I’d be glad to see it.

3 Likes

Anders

I agree that terminology is a problem. In order to try to overcome this I have tried to use a familiar disease (Covid-19) and interventions (e.g. self-isolation and antiviral agents) and given example data and probabilities.I have represented medical concepts with P maps which provide far more detail than 2x2 tables. It would help greatly if discussions such as yours on the relationship between confounding, collapsibility, counterfactual causal frameworks etc were illustrated with clinical examples in the same way for doctors to understand more easily.

You say that the way you define collapsibility, the risk ratio is always collapsible as a simple mathematical property of the effect measure. My understanding is that this only applies to the marginal risks ratios (e.g. {p(ViralSpread∩PCRpos∩Intervention) / p(ViralSpread∩PCRpos∩Control)}, {p(ViralSpread∩PCRneg∩Intervention) / p(ViralSpread∩PCRneg∩Control)} and {p(ViralSpread∩Intervention) / p(ViralSpread∩Control)}. However this does not apply to conditional risk ratios (e.g. {p(ViralSpread|PCRpos∩Intervention) / p(ViralSpread|PCRpos∩Control)}, {p(ViralSpread|PCRneg∩Intervention) / p(ViralSpread|PCRneg∩Control)} and {p(ViralSpread∩Intervention) / p(ViralSpread∩Control)}. Collapsibility of the latter conditional risk ratios only apply when p(PCRpos/Intervention) = p(PCRpos/Control)

1 Like

Frank. What is your opinion about the very pragmatic approach of fitting spline functions to distributions of test results in those with and without an outcome in the control and intervention groups and using these to construct curves displaying the probabilities of outcomes conditional on individual test results in those on intervention and control?

What is your definition of collapsibility?

The causal risk ratio, whether it is marginal or conditional on some covariates, is always collapsible:

  • If RR(a) = RR(b) then these are also equal to RR(a ⋃b)
  • There always exist weights such that RR(marginal) = sum(i) w_i RR(i)

These properties are only guaranteed to hold (for any data set) when the risk ratio is defined in terms of counterfactuals, so if you try to verify them using only observable quantities, you will run into trouble

That seems to be indirect. I’d just fit a regular forward-looking predictive model such as a binary or ordinal logistic model, to predict disease status from pre-test and test variables.

Thank you. Clumsy but not wrong in principle then?

As far as I can see, the importance only applies to those who use the change-in-estimate criterion of confounder identification. With associational criterion users there is no importance of making this distinction.

Agree, I too would like to see some evidence supporting this claim as that is the only thing left that can possibly resurrect this ratio, having lost on other grounds in this thread…

According to who?

Can I ask people who haven’t understood my argument to please refrain from making unsupported public claims that I have “lost”? The only thing that has happened here, is that some senior academics failed to understand my argument, in part because they are not up to date on recent advances in causal modelling, and therefore lack training in basic counterfactual reasoning.

You guys speak from the security of senior academic positions, your words are supported by significant academic credentials that give weight to your claims. You are creating the public impression that my arguments have been considered and found wanting. This, in turn, makes it harder to convince other academics to consider the merits of my claims. My career is literally on the line here. Please, for the love of science, just stay out of this conversation until you have understood my argument.

In my pre-print when C is the control group, I is the intervention group, T+ is a positive test result, T- is a negative test result, O is the outcome and r is the risk ratio I had assumed from the outset that
p(O∩T+∩I)/p(O∩T+∩C) = p(O∩T-∩I)/p(O∩T-∩C) = p(O∩I)/p(O∩C) = r.
This is what I mean by collapsibility of the two marginal risk ratios p(O∩T+∩I)/p(O∩T+∩C) and p(O∩T-∩I)/p(O∩T-∩C). Interestingly this seems to hold for the data form the Nephropathy / AER / ARB trial in Table 1 of the pre-print.

I agree and make the point clearly in my discussion that its is not possible to verify the conditions for collapsibility and that they can only be assumed (based on what you term counterfactuals). We well know in medicine that the causal mechanisms of disease and treatment effects are complex with much positive and negative feedback loops; they are rarely if ever simple cause and effects and unreliable in terms of predicting outcomes when it comes to the results of RCTs.

My concern as a clinician is that any estimated probabilities of outcome on control and intervention are ‘reliable’ or ‘well calibrated’. If curves displaying the probability of an outcome conditional on numerical tests results and control or intervention are created based on OR, RR, switch RR, splines, binary or ordinal logistic models, they have to be validated in some agreed way.

When this discussion started, I wouldn’t have appreciated this paper, but it has actually gone on so long that I have been getting up to speed on causal models in the intervening time. Thanks for sharing. The arguments make sense to me, but I’m not an expert merely a practioner looking for the right tools. Given that this thread is labeled as Bayes, maybe it should be called only “half Bayes” or “Pearlian”. :slight_smile:

FWIW I think @AndersHuitfeldt puts forward a pretty compelling argument for those cases when an intervention can be thought of as being active in some people but not others. It’s surprising it hasn’t gained more traction in biostats the many times it has surfaced given many trialists’ obsession with a much weaker definition of the types of people who respond. It’s particularly interesting because I think he has shown in the case of events like AEs the choice of effect measure can make a real difference in clinical decisions.

My only real struggle (and Anders and I have spoken about this) is that it gets difficult to navigate when thinking of active vs active comparisons and the implications for things like matching indirect comparisons, simulated treatment comparisons, and NMAs. Would you have a mix of ratios of survivals and ratio of events? How would you determine which is appropriate in active vs active comparisons (eg phase 4 trials). Would it be possible to say you need ratio of event probability to compare B to C but both B and C would both use ratio of survival if compared in placebo controlled trials? Couldnt’t this lead to inconsistent recommendations if head to head uses ratio of event probs and recommends C but ratio of survival probability vs placebo would tell you treatment B is better? These are honest questions.

1 Like

Since 6-period crossover studies are seldom done, it is seldom the case that evidence can be brought for the treatment tending to work in some patients and not in others. Evidence for differential treatment effect is few and far between. @stephen has written extensively about this. To within the limits of available covariates, in the vast majority of RCTs there is no evidence for heterogeneity of treatment effect and nothing you can do about it were it to hold because you haven’t measured the source of the heterogeneity.

Yes absolutely agree with issues of estimation/identification but the switch risk has a nice biological causal model that provides a neat package for the justification that responders exist (at least for things like anaphylaxis) and nice summaries that eg xxx% of the population has some sort of “switch” that makes them respond to therapy with eg anaphylaxis. This is different from claiming you actually have identified who a responder is/isn’t.

1 Like

I don’t find that as compelling as you. I need to know that something exists before I base any theory on it. In most cases the data are compatible with pure chance being the explanation.

1 Like

The term Heterogenicity of Treatment (HTE) effect seems to be used in different ways: between subgroups, predictive HTE analysis, etc (eg see https://www.acpjournals.org/doi/full/10.7326/M18-3667 ) and between individuals, which need multiple cross over trials to detect as @stephen and you pointed out. I think you also used the term differential treatment effect for the latter. There is also variation (HTE again?) in absolute risk reduction by applying a treatment effect measure (eg OR, RR or RD) to various baseline risks. I wonder if my discussion with @AndersHuitfeldt was hampered by crossed wires due to our different use of such concepts and terminology. I also wonder if it may even have hampered the current twitter discussion with Judea Pearl about RCTs, etc. What do you think?

Nothing in my model requires the “switches” to have a realistic interpretatation. These constructs can be representations of random biological processes that humans do not fully understand, yet it will still be possible to reason meaningfully about whether they have implications for choice of effect measure.

What do you mean by “pure chance”? Can you describe a model of biological reality where treatment effects are determined by “pure chance” and this leads to stability of the odds ratio? I am willing to make a significant bet that you can’t

Let’s try to cash out what “pure chance” means:

  • It might mean that the outcome under the control condition is a random event, and that the outcome under the intervention is a random event, with no correlation or structural relationship between the two. This is not a realistic model, and I don’t think it leads to stability of any effect measure.
  • It might mean that the outcome under the control condition is a random event, then there is a separate random event, such that if the separate event occurs, the outcome is modified if the individual is treated.
  • Or it might mean something completely different that I haven’t thought of, in which case you would have to specify what you have in mind

The type of randomness described in bullet point 2, is entirely consistent with my model. This just means that what I call a switch" is random event purely due to chance. It still leads to the same conclusions, if there is any reason to expect that these chance events either make the drug a sufficient or necessary cause of the outcome or the complement of the outcome.

Now in reality, we live in a deterministic physical universe, and what we call “pure chance” is almost certainly about individual-level variation in real physical attributes, it is just impossible for humans to make any credible claims about what those physical attributes are. It is probably wise to reason about what things are correlated with this “pure chance” event, so that they can be controlled for as effect modifiers, and I don’t see how that is possible if we aren’t at least allowed to speculate about what the “pure chance” events really are.

My model is not a complete description of reality, and it will rarely be a perfect match to biology. But among models that lead to stability of any effect measure, I believe it to be the least problematic one. One of its key advantages is that it enables us to understand when it holds and when it does not hold, so that we can evaluate how close we are to the ideal situation where it holds.

Either way: Your argument can be summarized something like this: Your model is justified by metaphysical constructs whose existence we have no evidence for. I do not want to assume such metaphysical constructs, and I therefore reject your model and instead choose to rely on an effect measure that has no biological justification.

This is insane. If you were to instead conclude “and therefore, we can never trust any statistics which assumes stability of any effect measure”, I would at least respect you for intellectual consistency (though I might try to convince you that sometimes, approximate models of reality are useful, even if they rely on constructs that are abstract representations of things humans cannot fully understand)… But when you choose to go with the odds ratio instead, it seems you are just throwing arguments at the wall to see what sticks, in order to protect your cherished odds ratio.

1 Like

Absence of evidence is not evidence of absence.

A conscientious scientist should always assume whatever is least convenient to the claim they want to make.

  • If you are looking for subgroup effects in order to justify licensing a drug, then it is of course appropriate to assume no heterogeneity and only reject that hypothesis if there is strong convincing evidence to the contrary.
  • But if you are trying to individualize treatment to a patient, assuming homogeneity is just wishful thinking.

Statisticians often justify the homogeneity assumption in the second case, by referring to consensus that arose in setting of the first case. This is just incredibly disingenious. The only reason we were able to all agree that subgroup effects should be approached with extreme caution, is that we want to hold scientists to the standard of assuming what is least convenient for making their claims.

Nature is incredibly complicated and it would be incredibly surprising if there was a law of nature that in always guaranteed effect homogeneity on any scale. The best we can do is reason about how close any particular situation is to a model where homogeneity holds

It is you who is assuming a metaphysical construct for which you show no evidence. Pure randomness is easy to model and to describe. But there are two competing models (to oversimplify a bit): pure randomness to within subject characteristics that can actually be measured, and non-pure-randomness in which there are tendencies for some subjects to respond to others not to. This has been studied in labor force participation where there are truly “movers” and “stayers” with regard to job changes.

If pure randomness does not apply and one does not know the factors that create the non-randomness, you are left with an ill-defined situation but a situation for which your argument may apply. If one knows the factors that make the outcomes have different tendencies for different subjects, then you should condition on those factors, and the odds ratio works as advertised and collapsibility is irrelevant.

4 Likes