Should one derive risk difference from the odds ratio?

Frank, you wrote that variation independence of the odds ratio from baseline risk “is much more of an advantage for ORs.” I am shocked that you would make such a sweeping generalization, especially when no such generalization can be correct. Whether the numeric properties being discussed are advantageous or not depends crucially on contextual properties like loss functions and what other information is available.

If the baseline risks are available and in hand, none of the measures has a compelling advantage if what we we want to be working with is the full risk function. If for some reason (often none other than someone forgot to extract or present baseline risks from research reports) I only get to have one of either the variation-independent OR or the very variation-dependent RD, I know I’d often want that to be the RD rather than the OR. For example, an OR of 9 for a very unpleasant side effect comes about regardless of whether the treatment pushes the risk from 1 per million to 9 per million or from 10% to 50%, whereas the RD would be 0.0008% in the first case and 40% in the second case. Depending on the treatment benefit, in the first case I may well accept treatment and the increased chance of the side effect but in the second case I may well reject treatment due to the increase. This kind of issue arises all the time in real clinical and public-health practice.

That is why for me your unnuanced claim looks more like numerology than statistical science, as has other OR promotions such as Doi’s. The bottom line as always is that purely numerical properties alone (such as VI, symmetry, etc.) are vastly inadequate for judging the relative utility and adequacy of candidate numerical procedures in real-world problems, because those determinations require context. Again, I’m surprised that you of all people would fall back to promoting numerical properties as if they dominate contextual considerations - a promotional problem that has plagued academic statistics (and spin-offs like econometrics) for generations.

1 Like

I really find that what Frank said makes a lot of sense and ask the same question. If an effect measure model has one less reason for variability that is unrelated to the actual association under consideration then logically it is better unless it can be shown that it has other mathematical distortions that linear probability models do not have. That is not the case since the linear probability models:
a) are always misspecified
b) can give meaningless predictions
c) have boundary problems
none of which are seen with logistic models. Are we then suggesting that lack of robustness to misspecification of nuisance models:is a reason to dump the logistic model for such linear probability models? That seems to me the argument of Richardson, Robins & Wang proposing alternative “robust” models other than the logistic.

I think the best path forward is indeed what Frank suggests - lets stop posting theoretical considerations and give a practical data-analysis example. I can share the Titanic dataset if anyone is interested as I note above I cannot even reproduce the analysis from those proposing such “robust” linear probability models.

As before I’m describing tendencies and not universal truths. I stand by the assertion that relative odds is an excellent modeling choice because when one adds the terms into the model that allow for non-applicability of constant ORs (e.g., interactions with treatment) one can get away with larger penalties on these “departures from what we hope for” terms. Then one can focus on risk differences from this model.

Here are the issues as i understand them:

  1. From a theoretical POV, the ultimate goal of any analysis is to compute covariate adjusted effect measure distributions translated to the probability scale for clinicians to use on an individual basis. So I think there is agreement on this point.

  2. From a data analysis POV: there are many ways to get to probability outputs; I do not think the proposal of the switch risk effect measure by @AndersHuitfeldt has gotten an entirely fair hearing, and is being confused with more common effect measures (ie. RR).

If have correctly interpreted his formulas, posts, and papers, he argues that mechanistic considerations can be used to mathematically derive a preferred effect measure that is variation independent, which has all of the parsimony benefits of the OR, but also preserves collapsibility.

These mechanistic considerations may also enable prediction and verification of treatment heterogeneity at sample sizes that logistic regression methods cannot detect.

From the POV of @f2harrell, he might be skeptical of the mechanistic assumptions and ask “How much of this result is reliant on the causal assumptions vs. the actual data?”

I proposed model averaging and model selection methods in a previous post to account for this uncertainty in the appropriate way to analyze the data.

I’d appreciate it if someone could point out any errors in my interpretation of the discussion between Frank, Anders, and Sander thus far.

2 Likes

I agree with it except don’t understand why preserving collapsibility should be part of the goal. I’d make the goal be the estimation of covariate-specific absolute risk reductions, which will most parsimonious using a base model that doesn’t try to provide collapsible primary parameters.

This paper from 2019 seems to assume the causal effect measure is being calculated from an observational study, where confounding is a concern.

We discuss two subtly different definitions of collapsibility, and show that by considering causal effect measures based on counterfactual variables (rather than measures of association based on observed variables) it is possible to separate out the component of non-collapsibility which is due to the mathematical properties of the effect measure, from the components that are due to structural bias such as confounding.

It is unclear to me if these considerations and reported benefits also apply to controlled trials.

I’m speculating that in a controlled trial scenario, mechanistic assumptions (if they hold) might permit detecting heterogeneity of treatment effect at smaller sample sizes with collapsible measures vs. logistic regression. This seems to be the point of contention.

I’ll leave further explanation to @AndersHuitfeldt or @Sander.

Frank, I agree with your modeling comment as you just stated it. What shocked me before is that you said you didn’t agree with my statement that variation independence of baseline risks can be seen as a disadvantage in some contexts. I thus gave an example of such a context. Based on your response to that (“…then one can focus on risk differences”) I think perhaps you overlooked “in some contexts” (which referred to application context, not models) and upon re-reading what I actually wrote you wouldn’t disagree with it. If so, we don’t disagree on this matter or much of anything in practice, and certainly not about the math.

A difference may be my stronger emphasis on application context as opposed to mathematical properties. As I explained above, I think for choosing procedures there has been an overemphasis on mathematical properties as opposed to highly variable contextual features (such as goals, costs, and background information). The imbalance toward the math has harmed statistical science as well as some applications. I have elaborated these concerns repeatedly in print, most recently in sections 1-3, 8, 9, 11 and 12 of https://arxiv.org/abs/2304.01392; in press, Scandinavian Journal of Statistics.

2 Likes

This is essentially the crux of the issue - “preserving collapsibility” but I am unable to understand what the big deal is with this concept and why it should be preserved (for any reason). I think modellers (who are application focused) are coming round to accepting that when models are compared between groups that have different distributions of other causes of the binary response this is the expected behaviour of a good effect measure and therefore such concerns are usually misplaced. If we were to disagree then what is needed is a data-analysis example that demonstrates how and why this concern needs to be upheld in real life.

Yes, I get it now. All that makes sense to me. Thanks.

I’d rather see causal inference always framed in a conditional way instead, e.g. P(Y=1|X,B) - P(Y=1|X,A).

1 Like

Sincere question: what does the word “causal” mean to you? I don’t think we are speaking the same language here.

Causal non collapsibility is a property of the effect measure, this consideration is just as relevant in randomised trials as in observational studies.

In randomized trials, non collapsibility can be observed in its pure form, whereas in observational studies, it becomes difficult to know whether observed changes in the effect measure upon conditioning are due confounding or non collapsibility

The purpose of this paper was in part to formalize a definition that disentangles the component of “associational non-collapsibility” that applies in randomised trials (“causal non-collapsibility”) from the component that primarily arises in observational studies due to confounding.

Thank you , I will have to think more carefully about why I think people may be misled by the term variation independence.

The term closure was specifically chosen because of its pre-existing technical usage in abstract algebra/group theory. My coauthors and I will soon release a preprint where we show that there are advantages to framing regression models in terms of the abstract algebra properties of functions that input a distribution under one value of the predictor, and output a distribution under another value of the predictor. I will get back to this question once the preprint is released, hopefully things will be clearer at that stage

One of two things

  • existence of a causal signal: the treatment is a cause of a nonzero expected benefit for a variety of patients (marginal view, magnitude varies of patient types), or
  • the treatment is the cause of a specific expected benefit for a specific type of patient (conditional view)
1 Like

How do you operationalize expected benefit without counterfactuals?

P(Y=1|X,B) - P(Y=1|X,A) can be nonzero even if there is no expected benefit of A vs B in patients with X, so inferring the value of this expression is not informative for the purposes of what you claim to be interested in?

I don’t understand your argument. In a controlled trial, the entire point is to create comparisons (either group parallel or using the individual as his/her own control) that are exchangeable (either on a set of covariates via various matching algorithms, or in expectation via various randomization protocols) on all factors except for treatment.

Any sufficiently surprising difference between treated and control can be attributed to the treatment if this exchangeability is considered known (via design) or justified because all other plausible explanations can be ruled out (via a well designed observational study). So I think Frank’s definition makes sense (at least to me).

What might be helpful: can you clarify what advantage collapsible effect measures have during the data analysis process vs. odds ratios? AFAICT, ths is analogus to the choice of parametric vs. nonparametric models. If the parametric (ie. causal) assumptions are correct, more information can be extracted from the data.

I don’t understand it either. Perhaps I left off a condition: P(Y=1|X, B, model) - P(Y=1|X, A, model).

Sure, in the ITT analysis of an ideal randomized controlled trial with no loss to followup, exchangeability holds by design, and the conditional quantities are trivially equal to the counterfactual quantities. In these simple settings, counterfactual notation is probably overkill. But once you begin consider settings where exchangeability may not hold by design, you need a framework and a language for reasoning about what exchangeability conditions are justified by our subject matter knowledge, and whether those conditions that are justified will be sufficient for the purposes of a data analyis that aims to determine the expected consequences of treatment. Causal inference is precisely the methodology that has been developed for this purpose. It simply cannot be done without counterfactuals (or something mathematically equivalent to counterfactuals).

Note that Frank said that causal inference should be framed in a conditional way, not that causal inference for the ITT analysis of ideal randomized trials should be framed in a conditional way

Collapsibility matters because, in most cases, it is impossible to imagine a data generating mechanism that leads to conditional stability of a non-collapsible effect measure. In other words: It is not plausible that nature generates data using a process that can be approximated with a non-collapsible model

2 Likes

I’d say that data are generated by an unknown model such that patients have events independently of each other (usually) and every patient has a different event probability even if on the same treatment. Our goal in modeling is to approximate that to our best ability. It doesn’t matter if that model involves collapsible quantities or not as long as it approximates reality. But I like what you wrote.

One problem with causal inference is that I see its practitioners applying it needlessly to an ITT situation with no dropouts (the ideal setting you mentioned), which tells me there is some overselling going on in general. There are some elegant uses of causal inference in the more complex scenarios. But in a subset of those, good modeling can make them unnecessary. The subset involves situations such as when the “competing event” is withdrawal from taking a drug due to unintended bad consequences of the drug. An ordinal longitudinal analysis, or a careful utility analysis, can place withdraw at the right level of “badness” and have it formally included as an endpoint instead of having any censored data and needing to entertain counterfactuals.

1 Like

I don’t object to any of the above, but I still have 2 questions:

  1. What is the problem with computing collapsible effect measures from logistic regression? Do you disagree with Frank that this model is useful in a wide variety of situations? Sander seems to agree (with certain exceptions – ie. log-risk with sparse data). Can’t causal reasoning be incorporated into an analysis that uses logistic regression?

  2. What data analysis plan do you propose instead of logistic regression, if you do not agree with this model as a default?

While I have no reason to doubt Frank’s intuition, I can see why it might be objectionable.

Sander stated above that other models should be considered:

How should an analyst do this in a principled way? If we are trying to do the best we can with a particular data set, it seems you need to collect data and fit models before making the choice; and if I’ve learned anything here, it is that this analysis uncertainty needs to be accounted for.

In an effort to synthesize the recommendations of Frank and Sander, I conclude model averaging and/or selection is the most principled way to do this.

But I may be missing something.

1 Like

A regression model is best understood as a set of homogeneity assumptions on a parametric scale. The model is correct if the homogeneity assumptions are correct, and incorrect if they are not. I do not believe the homogeneity assumptions of a logistic model. I am not interested in what parameters you “compute”, but in what parametric scale you use to define the model.

Certainly, some people utilize logistic models in causal inference, for example for predicting IPTW weights. I think the specific choice of model is mostly out of convenience, not because they have demonstrated that it is the best approach. The causal inference community is not very interested in parametric models (even if they use them in practice). Many of the thought leaders are moving to a semiparametric/machine learning mindset. I do not speak for the causal inference community, please do not interpret me as making this argument on behalf of other causal inference researchers. There is certainly no consensus against logistic regression among them.

My preference is for using the switch relative risk in place of the odds ratio, at least for the primary intervention of interest. This preference also applies to regression models. Weinberg (1986) (Applicability of the simple independent action model to epidemiologic studies involving two factors and a dichotomous outcome - PubMed) suggested implementing this in the GLM framework by using a log link when considering a main exposure that reduce incidence, and a complementary log link when considering a main exposure that increase incidence.

If you are uneasy about specifying the model after you know whether exposure increases or decreases risk, we will soon release a preprint that introduces a new class of regression models that handles this natively.