Should one derive risk difference from the odds ratio?

Hi Dr.Doi

Can I clarify that you’re emphasizing that it’s important to understand that when we see that an RCT- derived Odds Ratio has “moved” with adjustment for different factors, we should not infer that this signals the presence of underlying “confounding bias” (or “random confounding”- a term that seems to be used by epidemiologists but shunned by statisticians…) (?) Rather, we should recognize that, in an experimental (as opposed to observational) context, movement of the OR with adjustment can simply reflect the impact of an important (but non-confounding) prognostic factor?

I read this paper a couple of years ago and had difficulty reconciling some of its assertions with things I’ve read on the Datamethods site:

For example (my emphasis in bold):

“In summary, prognostic imbalance does not on average jeopardize internal validity of findings from RCTs, but if neglected, may lead to chance confounding and biased estimate of treatment effect in a single RCT. To produce an accurate estimate of the treatment-outcome relationship conditional on patients’ baseline prognosis, balanced or unbalanced PFs with high predictive value should be adjusted for in the analysis. Covariate adjustment slightly reduces precision, but improves study efficiency, when PFs are largely balanced. Once chance imbalance in baseline prognosis is observed, covariate adjustment should be performed to remove chance confounding.

I’d be interested (if you have the time), to hear your opinion on the use of the terms “bias” and “confounding” in this paper (which I believe was written by epidemiologists) and whether this usage comports with the way these terms are used by statisticians. It seems important that clinicians not make inferences that an RCT result is “biased” when bias is not actually present- this practice could lead to discounting of potentially compelling trial results…

Perhaps you are right if you read this literally, but most readers will interpret this to mean ‘However, the odds ratio is more reliable for generalizing…’. I take a strong view which may be unnecessary - Jazeel please update this.

Your own columns of sens/spec as ratios of probabilities vs your OR column should show you that your claim that “interpreting the RR as a ratio of probabilities is inconsistent with Bayes Theorem” is wrong, since the only difference between the 2 scenarios is a constant of proportionality. The discriminant information in this scenario is indeed 16 to 1, as the likelihood principle would indicate.

2 Likes

Hi Erin, yes, you are right - in the context of a RCT derived OR such movement reflects the impact of an important nonconfounding prognostic factor.

Regarding covariate adjustment in a RCT, if we have finite randomization residual imbalance in covariates after randomization may occur. Perhaps that is what the researcher is referring to by ‘chance confounding’ and I believe Ian Shrier has a paper on this where he says confounding by chance occurs with the same frequency in RCTs and observational studies. Because such imbalance is a form of random error that is reflected in the statistical uncertainty presented through the analysis - these do not need to be adjusted for.

Prognostic covariates that have a large impact on the primary outcome however are a different issue - they need to be adjusted for even in a RCT if the closest empiric estimate of the individual treatment effect is the desired goal of the analysis of the RCT data. The latter adjustment has nothing to do with confounding but rather is a method of dealing with HTE called ‘risk magnification’ by Frank.

1 Like

Our paper does not agree with that assessment - what is the evidence to support your position?

Simply looking at your mathematical claims and comparing them to the definitions of:

Discriminant information (discrete case):
I(P||Q) = - \sum_{x \in X} P(x)log(\frac{Q(x)}{P(x)})

The fraction is a log transformation of a ratio of probabilities at a particular x, while the information is a probability weighted sum of those log transformed probability ratios.

The Bayes Factor as defined in the James Berger et. al. paper I’ve linked to in a number of threads (including above):

The Bayes factor is virtually identical to the KL information divergence metric, in that it is the integrated likelihood, with a uniform prior being a special case where the Bayesian posterior and Frequentist “confidence” distribution (set of all intervals with \alpha = [0 ... 1] coincide (in large samples).

Brockett, P. L. (1991). Information theoretic approach to actuarial science: A unification and extension of relevant theory and applications. Transactions of the Society of Actuaries, 43, 73-135. PDF

These (to me) do not mean anything in terms of supporting the notion that a single likelihood ratio has any utility as a measure of discrimination - I think a data example is needed to get your point across

Your own example shows that ratios of probabilities when framed in terms of power and type I error can be constant, while converting those same probabilities to odds cannot be.

The fundamental relationship between frequentist alpha levels and Bayesian update from prior to posterior is shown via simple algebra.

Starting from Berger et al.s Rejection Ratio paper, the pre-posterior odds needed for an experiment is:

O_{pre} = \frac{\pi_{0}}{\pi_1} \times \frac{1 - \bar\beta }{\alpha}

The term with \pi can be thought of as a skeptical odds ratio \lt 1, with the entire ratio being a proportion of true to false rejections of the reference null hypothesis.

O_{pre} can be seen as the poster odds ratio \gt 1 needed to confidently reject the null hypothesis based not only on the design of this single study, but also on prior assessment of the relative merit of the hypotheses.

\alpha = \frac{1-\beta}{Odds(\theta|data) \times Odds(\theta)}

In Berger’s GWAS example, the scientists did not need for the poster odds to be greater than 1 in order to claim a discovery; simply going from \frac{1}{100,000} prior odds on a gene - disease relation to \frac{1}{10} (leaving high posterior odds that there no relationship even after the data were seen) was considered worthwhile from a scientific decision analysis.

Are you disputing that the Bayes factor is a ratio of frequentist error probabilities? If so, why?

The expression you have typed is incorrect - please have a close look at our paper. Also using alpha and beta will be confusing for everyone so I recommend sticking to TPR and FPR. Our example shows no such thing - if you think it does please explain how so.

  1. Which expression is incorrect?
  2. The only one confused here is you. Your entire framing of this problem is wrong.
1 Like

[quote=“R_cubed, post:607, topic:4403”]
α=1−β/(post x prior)[/quote]

The above is clearly incorrect. I am not going to make a decision about who is confused - the expression speaks for itself…

It literally takes high school level algebra to re-arrange the equations in this paper, that I have posted at least 2 times in this exhausting thread. Before making a blanket claim that I am wrong, you need to read it.

The authors are all top notch statisticians. I’m confident they can do high school level algebra.

From the highlights:

Pre-experimentally, these odds are the power divided by the Type I error.

When the prior and posterior are in odds format, this is precisely how \alpha was calculated in a GWAS study where James Berger was a consultant statistician.

You need to read the Rejection Ratio paper (cited above) in order to understand the argument, and why your claim about risk ratios vs ratios of odds ratios is wrong.

He describes his reasoning in this video (from about 5:00 to 30:00)

(The relation among posterior odds, prior odds and power to \alpha described at 25:00 - 28:00 mark).

I used similar reasoning in a re-analysis of a “null” meta-analysis claiming “no association” between excess knee ROM and risk of ACL injury for amateur athletes.

There is nothing special about diagnostic testing that has not already been explored in the hypothesis testing literature, where likelihood ratios are derived from frequentist error probabilities.

Related References

You can derive adaptive p values (significance threshold decreases as sample size increases) by minimizing the a linear combination of the error probabilities.

Luis Pericchi, Carlos Pereira “Adaptative significance levels using optimal decision rules: Balancing by weighting the error probabilities,” Brazilian Journal of Probability and Statistics, Braz. J. Probab. Stat. 30(1), 70-90, (February 2016)

1 Like

@R_cubed I understand your frustration but I’m not sure it’s even worth it to pursue this line of argument. As @AndersHuitfeldt has pointed out, the whole idea that the RR (or its interpretation) might be in violation of Bayes’ theorem is just a category error.

@s_doi In the section “Application of the OR versus the conventional RR to risk prediction”, you make frequent reference to “Method 2” (essentially the RR) being inconsistent with Bayes rule. I must ask again, what is your precise definition of being inconsistent with Bayes rule? It is certainly not defined in the paper. Furthermore, your references to “updating” probabilities have nothing to do with the standard idea of Bayesian updating. In the examples you consider, we are “updating” from a probability P(Y | X =1 , S = s) to another probability P(Y | X = 1, S = t) for some values s and t, where s and t stand for different populations of interest. Here we generalize to another population by changing the value of the covariate representing population that we are conditioning on. Bayesian updating, on the other hand, strictly deals with rearrangements of conditional probabilities keeping the conditioned on quantities the same: P(Y = y | X = x) = P(X = x | Y = y) P(Y = y) /P(X = x).

2 Likes

This talk tomorrow at the Online Causal Inference Seminar may be of interest to participants and readers in this discussion:

Hello everyone,

The Online Causal Inference Seminar is excited to welcome two student speakers, Benedicte Colnet from INRIA, and Keegan Harris from CMU, to present at our student seminar. The titles and abstracts are copied below. The seminar is on Tuesday, May 30 at 8:30am PT / 11:30am ET / 4:30pm London / 6:30pm Tel Aviv / 11:30pm Beijing .

You can join the webinar on Zoom here (webinar ID: 996 2837 2037). The password is 386638.

As a reminder, you may suggest a speaker or propose to speak for future seminars here.

We look forward to seeing you on Tuesday!

Best wishes,
Organizers of Online Causal Inference Seminar

- Student speaker 1: Benedicte Colnet (INRIA)

  • Title: Risk ratio, odds ratio, risk difference… Which causal measure is easier to generalize?

  • Abstract: There are many measures to report so-called treatment or causal effect: absolute difference, ratio, odds ratio, number needed to treat, and so on. The choice of a measure, e.g. absolute versus relative, is often debated because it leads to different appreciations of the same phenomenon; but it also implies different heterogeneity of treatment effect. In addition some measures – but not all – have appealing properties such as collapsibility, matching the intuition of a population summary. We review common measures and their pros and cons typically brought forward. Doing so, we clarify notions of collapsibility and treatment effect heterogeneity, unifying different existing definitions. Our main contribution is to propose to reverse the thinking: rather than starting from the measure, we start from a non-parametric generative model of the outcome. Depending on the nature of the outcome, some causal measures disentangle treatment modulations from baseline risk. Therefore, our analysis outlines an understanding what heterogeneity and homogeneity of treatment effect mean, not through the lens of the measure, but through the lens of the covariates. Our goal is the generalization of causal measures. We show that different sets of covariates are needed to generalize an effect to a different target population depending on (i) the causal measure of interest, (ii) the nature of the outcome, and (iii) the generalization’s method itself (generalizing either conditional outcome or local effects).

2 Likes

Why should they have any thing to do with what your standard idea is? Why should our notion of updating align with your understanding of the terminology?. One of the good things about our papers is that we use data examples to indicate exactly what we mean so there is no confusion. The confusion is created by vested interests when scientific jargon is pursued in an attempt to justify what cannot be justified or perhaps to obfuscate. Also there seems to be a lot of misunderstanding of what the expressions mean as seen with the use of expressions by @R_cubed - 1-beta/alpha x prior = posterior - simple algebra indeed but clouded by what we want to prove regardless of what has been shown.

Regarding the relationship between Bayes factors, rejection ratios, and frequentist error probabilities:

You are certainly correct. I did not entirely understand the claims made by Doi until I read that statement regarding likelihoods being ratios of ORs rather than ratios of conditional probabilities.

The only claim in contradiction to Bayes Theorem, and basic concepts in information theory, is Doi’s. This can be easily seen by careful examination of the equations in Berger et. al’s paper, where LRs are ratios of frequentist error probabilities.

This can also bee seen in Doi’s own example

In order for a likelihood ratio to behave as expected, constants of proportionality can be ignored. That cannot be done when converting a ratio of probabilites into ratio of odds ratios as his own chart shows.

If ratios of odds ratios appear anywhere in re-arrangements of Bayes theorem, it is on the left side of the equation, after converting prior and posterior target probabilities to odds, as in the relative belief representation. From that, it is easy to see how \alpha relates to Bayesian updating via power considerations.

I am grateful that @AndersHuitfeldt persisted in this discussion. I’ve learned much from considering the arguments in his papers and following up on some citations. I think there is merit in the proposed switch risk ratio, although I see no objection to using methods recommended in RMS to compute it.

1 Like

Hi Suhail

Thanks for responding. One comment about your paper: I wonder if the distinction between the Odds Ratio as used in the observational study context versus in the RCT context should be highlighted more clearly (?) Maybe my understanding of the term “clinical epidemiology” is too narrow, but when I hear the word “epidemiology,” I automatically think of observational studies rather than RCTs…

When I took introductory epidemiology courses many moons ago as a medical student and resident (and later as a job requirement), my instructors were either practicing epidemiologists or physicians with some additional epidemiology training. This is what I recall being taught:

  • Confounding was explained as a potential source of bias. A “confounder” was a factor that was associated with BOTH the patient’s likelihood of receiving a certain treatment AND the outcome. In observational studies, one way to test to see if a factor was acting as a confounder was to see if the point estimate moved by more than a certain amount if you adjusted for that factor. This led to a view that movement of a point estimate in response to an adjustment procedure signalled that “bias” was present in the study (and therefore it was a good thing you did the adjustment);
  • If you failed to adjust for confounders, then the point estimate from your study would have been “biased” (i.e., farther from the “truth”).

Maybe the above teaching wasn’t off base in the context of analyzing observational studies (?) But physicians primarily use RCTs to guide treatment decisions (when available). Students who are taught critical appraisal by epidemiologists will extrapolate epidemiologists’ definitions of “confounding” and “bias” to their appraisal of RCTs. Confusingly, statisticians seem to have their own understanding of “bias” and “confounding” and how these terms apply in an RCT context. And when outcomes of an RCT are binary and Odds Ratios come into the picture, all hope for understanding by novices is lost. I suspect these are the reasons why students get so confused:

  • In at least some epidemiology circles (or at least in the minds of students they teach), movement of a point estimate as a result of an adjustment procedure seems to have been conflated with the notion that “bias was present;”
  • Epidemiologists (but not statisticians?) use the term “random/chance confounding;"
  • Hapless medical students and residents trying to understand how to appraise RCTs wonder how it can be true that “confounding is addressed by randomization,” yet epidemiologists still discuss “random confounding…”

My understanding (?maybe incorrect):

  • Statisticians seem to focus on the “in expectation” part of the randomization concept as being most important when discussing the potential for confounding to be present (or not). In other words, statisticians implicitly tack on the phrase “in expectation” when defining a confounder- they view a confounder as a factor that is associated (“in expectation”) with both the likelihood of receiving a particular treatment and the likelihood of the outcome of interest (?) If the “expectation” is not present, EITHER because there’s a lack of prior evidence that the factor in question is associated with exposure (an important consideration when analyzing observational datasets) OR because the act of treatment allocation has been divorced from the likelihood of receiving a certain treatment (through the act of random allocation), then the factor will not exert confounding bias;

  • In contrast to statisticians, epidemiologists seem to focus not on the “in expectation” angle to randomization, but rather on the outcome of the randomization process (i.e., how the covariates ended up actually getting distributed between arms as a result of randomization i.e., how the randomization procedure actually “turned out”). I realize that there are probably decades of published studies on “random confounding,” but it seems like a very difficult concept for students to reconcile with the definition of a “confounder” that seems to be preferred by statisticians.

  • Statisticians consider that valid inference hinges most fundamentally on proper expression of uncertainty in a study result. A trial result must accurately reflect the things we know and the things we don’t know about the phenomenon under study. If we know that certain factors are prognostically important AND if we have measured these factors, then we must adjust for them in our analysis in order for our resulting inference to be valid. Conversely, if either we are unaware that certain factors are prognostically important OR if we are aware but unable to measure prognostically important factors (with both scenarios leading to a failure to “adjust” for these factors in our analysis), then our inference will remain valid. However, since our study will have been suboptimally “efficient” at identifying any underlying signal of treatment efficacy, we could be faulted for an unethical trial design (exposing more patients to an experimental treatment than is necessary to detect the efficacy signal). Before running an RCT, we should thoroughly understand the condition being treated, making every effort to measure known important prognostic factors (?)

Would this be a fair summary?:

  1. It’s important to understand the distinction between confounders and prognostic factors;

  2. When interpreting different types of effect measures, it’s important to recognize how each one can be affected by confounders and prognostic factors and to consider whether the study design is observational or experimental;

  3. Epidemiologists and statisticians don’t seem to agree that “random confounding” exists (?);

  4. Both confounders and prognostic factors can affect the OR in observational studies, whereas it will primarily be the latter (i.e., prognostic factors) that affect the OR in RCTs (?);

  5. When we adjust for important prognostic factors in an RCT that is using the OR as the effect measure, the OR tends to move farther from the null, while confidence intervals become wider. Overall, the adjustment for important prognostic factors tends to confer greater power/efficiency to the study, since the point estimate moves away from the null more rapidly than the rate at which the confidence interval widens (?);

1 Like

I think we’re getting a bit far from the original intent of this topic, and I for one do not enjoy hearing any more about Bayes’ factors, \alpha, \beta, likelihood ratios (personal opinion), and what constraints in one metric lead to on another metric.

Hi Erin, these are great questions that take us back to what is important – decision making in practice which is after all the real goal of all these methodological discussions which sometimes get railroaded by vested interests. In response:

  1. Agree, it is indeed critical to understand the distinction between confounders and prognostic factors; The former leads to bias if ignored and the latter to averages that apply to no one if ignored. Note that all confounders are also prognostic for the outcome so if we say ‘prognostic factors’ we mean non-confounding prognostic factors.

  2. Absolutely right, interpretation of different effect measures needs recognition of how each one can be affected by confounders and prognostic factors and consideration of RCT or not; However, it is also important to understand what ultimately an effect measure is intending to measure and whether it measures the intended association or something additional to it.

  3. I am not sure if it is that clear cut – what I meant previously is that covariate imbalance can occur by chance in a RCT but we do not need to adjust for these unless it is a strong prognostic factor. There is no ‘random confounding’ – confounding is due to differential distribution of covariates that influence both treatment and outcome (independent of treatment) in the compared groups and leads to real associations (or absence thereof) except that they are spurious. I doubt there is a divide on this between epidemiologists and statisticians.

  4. Absolutely right for properly conducted RCTs

  5. Yes, that is right. While there have been several calls for covariate adjustment in individually randomized trials, this has been raised in relation to the statistical perspective which links prognostic covariate adjustment with increase in statistical power to detect a treatment effect yet (paradoxically) decreased precision for the treatment effect estimator.This only happens with baseline covariate adjustment in RCTs analyzed using models that are noncollapsible and this is not a paradox because the precision comparison is made for estimators of different estimands while the conditional and marginal estimands share the same null.

1 Like

True, we see that all the time. This was because the 10% rule got accepted quite rapidly. Then when logistic regression did not really align with that (noncollapsibility) people began discussing how to distinguish between a change in estimate due to noncollapsibility from confounding. In my view, the change in estimate criterion for confounding should be completely dropped and we should move on to some DAG based procedure but not the sort suggested by Pearl in his ‘Book of why’ for reasons outlined here

1 Like