Should one derive risk difference from the odds ratio?

I do not believe that my research program falls into this trap. I am very clear that effect heterogeneity is the default situation, and that homogeneity should only be invoked in special situations where this corresponds to reasonable beliefs about biological mechanisms.

It is not fully generic, in the sense that it is possible for two groups to have the same μ0 but different μ1. However, any kind of reasoning based on effect measure stability (for any parameter) can trivially be represented in this way, making it a very flexible and general representation of how findings from randomized trials are used in practice to inform individual level decision making (approaches based on risk difference, risk ratio, odds ratio etc are all special cases). Moreover, if an effect measure λ happens to be stable between groups, it is easy to prove that the approach based on its associated effect function gλ will lead to correct results when transporting results between those groups

I am unfamilar with categorical formulations, and will have to read up on it. As always, there are multiple mathematically equivalent ways to formulate the same ideas. I am always willing to try new formulations in order to communicate better with the statistical community (with a preference for simplicity whenever possible). So far, I have tried three equivalent formulations (counterfactual outcome state transition parameters based on counterfactual variables, causal pie models and modified causal DAGs), but I am always willing to try new formulations if this helps the reader. Ideally, I would need a coauthor who is familiar with such formulations to attempt this

1 Like

I am not going to make a judgment about this because I haven’t studied the issues deeply enough. But I am going to call a halt to ALL posts that even HINT at personal judgements. Everyone: stick to the science and stick to clear non-esoteric examples. I will delete any post that includes any form of attack on other person from this point on.

8 Likes

My rapid response has now been posted at https://ebm.bmj.com/content/early/2022/08/10/bmjebm-2022-111979.responses

The editors asked me to shorten my response, which led to deletion of the paragraph about why I consider it “not even wrong”. Unfortunately, the title from the original version was retained, which may be confusing to readers who are not familiar with mathematical lore.

The letter was also edited to change my request to appoint a statistician to evaluate the paper for retraction, such that it now instead reads as a request for clarification. I want to be clear that I stand by my insistence that such evaluation is necessary.

The key claim in the paper is a purported proof that “the conventional interpretation of the risk ratio is in conflict with Bayes’ theorem.”. If this was truly proven, it should be very easy to find a statistician who has read and understood the paper, and who is willing to publicly stake their own credibility on the claim that this conjecture is true and proven. That is the implicit standard work flow of mathematical publishing: When a claim of a proof is published, it is assumed that there exist others are willing to publicly defend the correctness of the claim if necessary. Is there anyone who is willing to do that in this case?

5 Likes

Statisticians taking up your challenge:
shrubbery

2 Likes

This whole situation is sociologically interesting, perhaps revealing cultural differences between mathematical and medical scientific traditions. In mathematics, the scientific advance is almost always a theorem and its proof, meaning that the manuscript contains absolutely everything relevant to evaluating its correctness. In contrast, in medicine, so much is dependent on trusting the authors on their claims about the data. This leads to very different cultures

Doi et al have claimed a theorem and a proof, and got it published in a medical journal. I have questioned the soundness of their theorem, and publicly staked my reputation and my future in methods research on this. This is not an action I would take lightly. At the very least, I would have expected a number of statisticians to weigh in on the issue.

This manuscript is very clearly intended to be understood as a work of mathematics, in the sense that the contribution is fully deductive. It must be held to the standards of mathematics. When I cast doubt upon the published scientific record, one would think the community gives priority to resolving whether my accusation is correct. That cannot happen if everyone hides from the controversy

1 Like

Have you considered that maybe everyone is still burned out from the last time something like this happened?

1 Like

Suhail Doi has now posted the following rapid response at the journal website. I reproduce it here in its entirety:

The problem in evidence-based medicine arises when we port relative risks derived from one study to settings with different baseline risks. For example, a baseline risk of 0.2 and treated risk of 0.4 for an event in a trial gives a RR of 2 (0.4/0.2) and the complementary cRR of 0.75 (0.6/0.8). Thus the ratio of LRs (RR/cRR) is 2/0.75 = 2.67. If applied to a baseline risk of 0.5 the predicted risk under treatment with the RR “interpretation” is 1.0 but with the ratio of LRs “interpretation” is 0.73. Here, the interpretation of the risk ratio as a likelihood ratio, using Bayes’ theorem, clearly gives different results, and solves the problem of impossible risks as clearly depicted in the manuscript and the example.
If, in our effort to highlight the need of this correct interpretation, we have used strong wording that annoyed the commentator we feel the need to express regret. We hope that the commentator could also feel similarly for his scientifically unbecoming choice of wording that culminated with “Doi’s Conjecture”.

In this response, Suhail finally admits (at least implicitly) that he is trying to make claims about transportability/generalizability, not about “interpretation”. Moreover, he also states that the argument depends on the property which we have called “closure”. This admission is helpful, as I can trivially win any discussion against Suhail if we agree that the goal is transportability, and that his argument relies only on closure. I am not going to repeat these arguments here, as they have been stated repeatedly in this thread and in several papers and preprints.

The response still makes no attempt to clarify what he means by “interpretation”. Interestingly, this response appears to suggest that when “interpreting” the relative risk as a likelihood ratio, the investigator is constrained to only using it for transportability purposes in the form of the ratio of the standard RR and the complementary cRR, as (RR/cRR). This object is usually known as the odds ratio. No attempt is made to establish why the interpretation of the relative risk as a likelihood ratio forces the investigator to use the odds ratio for transportability purposes.

Suhail’s erroneous and unsubstantiated claim remains on record, as the key message of a published paper: that interpreting the relative risk as a relative risk (i.e. interpreting a spade as a spade) is “inconsistent with Bayes Theorem”. My position is unchanged: I consider retraction of this paper to be scientifically necessary.

Doi accuses me of scientifically unbecoming choice of wording, and expresses hope that I would express regret for this. I want to be clear that I regret nothing. I remind readers that Suhail Doi dragged me into this discussion, by tagging me publicly on this forum immediately after his paper was published in BMJ Evidence-Based Medicine, with a claim that this paper puts closure to our discussion about choice of effect measure. When I tried to exit the conversation, he dragged me back in with continued claims that his “math” proves unequivocally that no object derived from the complementary relative risk can ever have value as an effect measure (an implication of which is that my life’s work is worthless). What does he expect to happen - does he think I have some kind of ethical obligation to pussyfoot around the undeniable fact that his paper is incoherent and that its main message is an impossible and unjustified claim about Bayes Theorem?

This incident has caused me to lose a lot of respect for the integrity of the scientific publishing model and for academics working in medical statistics. The fact that this paper sailed through peer review is in itself reminiscent of the Sokal Affair. The facts that the journal doesn’t seem to care a single bit when I point out that the paper is incoherent, and that the academic community doesn’t care enough for anyone to go on record with their opinions, beggars belief. What is the point of having a formal system of peer review if we aren’t making a good faith attempt to verify the correctness of the scientific record?

2 Likes

My latest manuscript, “Mindel C. Sheps - Counted, dead or alive” will appear as a commentary in Epidemiology in 2023, half a century after Sheps’ death in 1973. This manuscript highlights Sheps’ important contributions to the discussion about choice between effect measures, and is also an attempt to simplify arguments I have made elsewhere in support of Sheps’ conclusions. The final author manuscript is now available as a preprint on arXiv, at [2211.10259] Mindel C. Cheps: Counted, dead or alive

5 Likes

This is my response to the commentary:

Response to Mindel C. Sheps: Counted, Dead or Alive

Dear Editor

It is true that when decision making in Medicine proceeds (e.g. drug A to prevent outcome Y), clinicians make use of research results that are reported in terms of estimated probabilities. Thus, Pr [Y(1) = 1] is the risk expected under the drug treatment while Pr [Y(0) = 1] is the estimated baseline risk under the control treatment. Thus for the particular baseline risk in the study (Pr [Y(0) = 1]) there is a relative risk (RR) given by RR = (Pr [Y(1) = 1])/( Pr [Y(0) = 1]). The problem with the common practice advocated to clinicians to combine the patient-specific baseline risk with the RR to estimate a patient’s risk under treatment is that the RR varies with every baseline risk and thus the estimated risk under treatment assuming a constant RR is not really that useful[1]. As stated by Huitfeldt[2], statisticians tend to appreciate this more because RR models may lead to predictions outside the range of valid probabilities or different predictions depending on if the RR or its complement (cRR) are used (e.g. if RR = RRdead then cRR= RRalive). However the latter are not the main limitations of the use of the RR in clinical practice, but rather the former is the critical issue and these have led to heated discussions on Frank Harrell’s blog with Huitfeldt.
To illustrate this, say Pr [Y(1) = 1] = 0.4 and Pr [Y(0) = 1] = 0.2 then the RR = 0.4/0.2 = 2 and cRR = 0.6/0.8 = 0.75. The common (and unquestioned practice) is that this RR can apply to any baseline risk other than 0.2) and thus, as stated by Huitfeldt, combining a patient-specific baseline risk with this RR is okay. We have highlighted a problem with this previously, first by noting that the RR is not independent of baseline risk[1] followed by a heated debate.[3-5] Given that the RR is a ratio of two probabilities and so too is the likelihood ratio (used in diagnostic studies), we have shown that the RR can also be interpreted as a likelihood ratio (LR)[6]. This interpretation holds when we consider a binary outcome to be a test of the treatment status in which case Pr [Y(1) = 1] = sensitivity while Pr [Y(0) = 1] = 1 – specificity and thus the RR = LR+. Similarly the cRR = LR− and LR+/ LR− is the odds ratio[6]. The difference emerges in use of the two interpretations of the same ratio at a different baseline risk. As a RR, for a baseline risk of 0.5, the estimated risk under treatment is now 1. As a ratio of LR’s, the estimated risk under treatment is (0.5/0.5×2/0.75)/(1+(0.5/0.5×2/0.75)) = 0.73. Of course, for the original baseline risk of 0.2 the estimated probability under treatment is the same with either interpretation. There is no difference in these interpretations whether we use the RR or cRR (aka survival ratio).
The simple solution proposed by Huitfeldt[2] (and attributed to Sheps) of using the survival ratio was also thought intuitive by us[7] at some point before we realized that this does not work because neither the RR nor cRR are independent of baseline risk1. This is exactly what Sheps said in her 1959 paper[8] - “Unfortunately the value of the [RR] has no predictable relation to the value of [cRR] …… and depends greatly on the magnitude of [baseline risk]”. Sheps approach therefore does nothing to resolve any of the theoretical problems with the RR or its complement. While Huitfeldt goes on to talk about 1−RR and 1−cRR in the light of Sheps paper, I will ignore this because what is clear is that if the RR and cRR are dependent on baseline risk then these interpretations do not hold when the RR or cRR are interpreted outside of the populations used to derive these measures. This follows logically from the fact that these measures, contrary to Huitfeldt, must be different in different groups defined by baseline risk irrespective of any assumptions about biological mechanisms.
Next Huitfeldt suggests that unmeasured individual-level attributes that determine whether a person responds to treatment may be called “switches”. He suggests that one or more switches need to be present in a person for a treatment to have an effect and they combine to determine whether and how treatment affects the outcome. However none of these attributes are subsequently mentioned and instead he outlines four logically possible ways that such switch patterns can operate based on the attributes of the treatment rather than any individual-level patient attributes. It is unclear therefore a) why he concludes that cRR will be stable when treatment is a sufficient cause of the outcome, b) what he means by switch pattern or c) why it should be independent of baseline risk. The same applies to the rest of Table 1. My impression of what is being done here is that Huitfeldt would like us to ignore the mathematical properties of the ratio and instead believe that some esoteric biological mechanism must be considered to be at play that serves to make the ratio independent from baseline risk. The implication therefore is that non-independence (from baseline risk) must not be faulted on the ratios mathematical properties but on the user who does not understand how biology works. This, of course is all contrary to what Sheps proposed[8].
I will conclude by saying that the RR and cRR are best interpreted as likelihood ratios and therefore need to be combined for their use as effect measures. The ratio RR/cRR = odds ratio and the latter itself is a likelihood ratio of a different type that connects risk under no-treatment to risk under treatment[6].

Reference List

1. 	Doi SA, Furuya-Kanamori L, Xu C, Lin L, Chivese T, Thalib L.  Controversy and Debate: Questionable utility of the relative risk in clinical research: Paper 1: A call for change to practice.  J Clin Epidemiol 2022;  142:271-79.
2. 	Huitfeldt A.  Mindel C. Sheps: Counted, Dead or Alive.  Epidemiology 2023; 
3. 	Doi SA, Furuya-Kanamori L, Xu C, Chivese T, Lin L, Musa OAH, et al.   The Odds Ratio is "portable" across baseline risk but not the Relative Risk: Time  to do away with the log link in binomial regression.  J Clin Epidemiol 2022;  142:288-293.
4. 	Xiao M, Chu H, Cole SR, Chen Y, MacLehose RF, Richardson DB, et al.   Controversy and Debate : Questionable utility of the relative risk in clinical  research: Paper 4 :Odds Ratios are far from "portable" - A call to use realistic  models for effect variation in meta-analysis.  J Clin Epidemiol 2022;  142:294-304.
5. 	Xiao M, Chen Y, Cole SR, MacLehose RF, Richardson DB, Chu H.  Controversy and Debate: Questionable utility of the relative risk in clinical  research: Paper 2: Is the Odds Ratio "portable" in meta-analysis? Time to  consider bivariate generalized linear mixed model.  J Clin Epidemiol 2022;  142:280-287.
6. 	Doi SAR, Kostoulas P, Glasziou P.  Likelihood ratio interpretation of the relative risk.  BMJ Evid Based Med 2022; 
7. 	Furuya-Kanamori L, Doi SA.  The outcome with higher baseline risk should be selected for relative risk in  clinical studies: a proposal for change to practice.  J Clin Epidemiol 2014;  67:364-7.
8. 	Sheps MC.  An examination of some methods of comparing several rates or proportions.  Biometrics 1959;  15:(1)87-97. 0006-341X.
1 Like

It is true that when decision making in Medicine proceeds (e.g. drug A to prevent outcome Y), clinicians make use of research results that are reported in terms of estimated probabilities. Thus, Pr [Y(1) = 1] is the risk expected under the drug treatment while Pr [Y(0) = 1] is the estimated baseline risk under the control treatment. Thus for the particular baseline risk in the study (Pr [Y(0) = 1]) there is a relative risk (RR) given by RR = (Pr [Y(1) = 1])/( Pr [Y(0) = 1]). The problem with the common practice advocated to clinicians to combine the patient-specific baseline risk with the RR to estimate a patient’s risk under treatment is that the RR varies with every baseline risk and thus the estimated risk under treatment assuming a constant RR is not really that useful[1]. As stated by Huitfeldt[2], statisticians tend to appreciate this more because RR models may lead to predictions outside the range of valid probabilities or different predictions depending on if the RR or its complement (cRR) are used (e.g. if RR = RRdead then cRR= RRalive). However the latter are not the main limitations of the use of the RR in clinical practice, but rather the former is the critical issue and these have led to heated discussions on Frank Harrell’s blog with Huitfeldt.

I interpret this paragraph to state that you consider baseline risk invariance to be the crucial consideration for choice of effect measure. I want to ask whether you agree or disagree with any of these statements:

  1. Regardless of which scale is used to measure the effect, it is always possible for the effect of treatment to vary between groups that have different baseline risk. This is true for OR, RR, cRR, RD, etc.
  2. Therefore, “invariance to baseline risk” is not a mathematical property of any effect measure. Trying to prove that an effect measure is mathematically known to be invariant to baseline risk (using mathematical logic alone), will always be futile. Given that mathematically guaranteed invariance to baseline risk is impossible, this is not a useable when choosing between effect measures.
  3. It is true that for some effect measures, invariance to baseline risk is impossible. For example, if RR among women is 2, and the baseline risk among men is 0.6, then it is not possible for the RR to take the same value in men and women. The phenomenon where invariance to baseline risk is sometimes impossible, corresponds exactly to what we call “non-closure”.
  4. The odds ratio and the switch relative risk are both closed

Given that the RR is a ratio of two probabilities and so too is the likelihood ratio (used in diagnostic studies), we have shown that the RR can also be interpreted as a likelihood ratio (LR)[6].

This is uncontested, but I am genuinely puzzled about why you keep bringing up this trivial result, which is not relevant to the discussion.

This interpretation holds when we consider a binary outcome to be a test of the treatment status

Why do you think it is useful to consider a binary outcome as a test of treatment status? Do you think it is clinically useful to determine whether a patient has a disease, for the purpose of inferring probabilistically whether he is likely to have been treated? Or is there some other rationale behind this?

The simple solution proposed by Huitfeldt[2] (and attributed to Sheps) of using the survival ratio was also thought intuitive by us[7] at some point before we realized that this does not work because neither the RR nor cRR are independent of baseline risk1.

If you require that an effect measure can only be useful to the extent that there is a mathematical guarantee of invariance to baseline risk, then no effect measure will meet your requirements, not even your cherished OR. If this is the case, I would sincerely suggest you either go fully non-parametric or alternatively conclude that statistical inference is an impossibility.

There is no point in continuing this conversation if you keep on requiring that the switch relative risk satisfies an impossible requirement, one that is not even satisfied by your preferred effect measure.

This is exactly what Sheps said in her 1959 paper[8] - “Unfortunately the value of the [RR] has no predictable relation to the value of [cRR] …… and depends greatly on the magnitude of [baseline risk]”. Sheps approach therefore does nothing to resolve any of the theoretical problems with the RR or its complement.

First of all, if you want to have a good faith discussion about this, please stop with the selective editing immediately. What this part of Sheps’s paper establishes, is that there is no one-to-one relationship between RR and cRR. Further, that ifyou have a constant RR, then cRR is greatly dependent on baseline risk. Similarly, if you have a constant cRR then RR depends on baseline risk. This is a relational statement about what we can infer about one effect measure if another effect measure is constant.

In fact, for any two non-equivalent effect measures, it will be true that if a (non-null) effect is stable on one scale, then the effect on the other scale will depend on the baseline risk: I can write RR as a function of RR, and it will have a term for baseline risk. But similarly, I can write OR as a function of RR, and it will also have a term for baseline risk. This relative baseline dependence is symmetric, and gives no reason to prefer one effect measure over the other. This line of reasoning is only interesting if you have some reason to assign priority to one of the scales, based on some kind of background knowledge that the effect is stable on that scale.

This follows logically from the fact that these measures, contrary to Huitfeldt, must be different in different groups defined by baseline risk irrespective of any assumptions about biological mechanisms.

Consider a high quality large randomized trial, which finds that the relative risk (for the effectiveness outcome) is is equal between men and women, who have different baseline risks. Are you saying this is theoretically impossible? If I find a study in NEJM or Lancet where there is no such difference, does it disprove your theoretical argument? If not, what does your theoretical argument even mean? Does it rule out any possible future observations?

My impression of what is being done here is that Huitfeldt would like us to ignore the mathematical properties of the ratio and instead believe that some esoteric biological mechanism must be considered to be at play that serves to make the ratio independent from baseline risk. The implication therefore is that non-independence (from baseline risk) must not be faulted on the ratios mathematical properties but on the user who does not understand how biology works. This, of course is all contrary to what Sheps proposed[8].

I am not asking you to ignore any “mathematical properties”. I am telling you that what you are asking for (“mathematically guaranteed baseline risk invariance”) is not a criterion that can be met by any effect measure, and therefore a red herring. Moreover, I am telling you that given the best insights we have from toxicology about how to model mechanism of action, and given the best insights we have from psychology and philosophy about how to generalize causal effects, the most rational choice is often to start from a simplified biological (rather than mathematical) model that implies stability of the switch relative risk, and then think about all the possible reasons that this biolological model could go wrong:

  • Is switch prevalence correlated with baseline risk?
  • Does switch prevalence differ between segments of the population?
  • Does the drug have non-monotonic effects?

Depending on our views about the plausibility of these threats to validity, we can then make informed choices about whether there is any point at all in going forward with the analysis, and if so, whether there is a need for interaction terms or subgroup analysis, whether we can get point identification or have to settle for partial identification/bounds, whether we need a sensitivity analysis, etc.

I will conclude by saying that the RR and cRR are best interpreted as likelihood ratios and therefore need to be combined for their use as effect measures. The ratio RR/cRR = odds ratio and the latter itself is a likelihood ratio of a different type that connects risk under no-treatment to risk under treatment[6].

This is a complete non sequitur. You have still maybe absolutely no attempt to explain why “interpretation” matters when the actual math is invariant to interpretation, nor why the interpretation as a likelihood ratio prohibits its use as an effect measure, or why the odds ratio is required in order to “connect” risk under treatment to risk under no treatment.

3 Likes

When speaking of the choice of an effect measure, I believe we are talking about a default choice, i.e., a choice that has the greatest chance of being useful and the lowest chance of being misleading. Based on a wide variety of datasets, the OR would be my default choice as a relative effect measure as it has the highest probability of being constant, so has the lowest apriori probability of being misleading. Your mileage may vary.

3 Likes

What does this even mean? What is your probability space?

Is this a frequentist statement about the prevalence of “between-group effect measure stability”, when considering the cartesian product of all possible ways that we can partition populations into groups, and all exposure-outcome relationships? Or some other frequentist formalization? If so, where is your empirical evidence?

Or is this a Bayesian statement about Frank Harrell’s personal priors? If so, we are talking about a highly informative prior distribution, one that is completely unresponsive to argument. It is unclear to me why anyone would be interested in knowing where probability mass is assigned in this distribution. In order for this prior to be well-calibrated to reality, it would be very important for the Bayesian agent to to engage with arguments and update the distribution accordingly. It is very obvious to the world that this is not attempted.

You missed my points. I was speaking only practically based on a career of analyzing data. Since I’ve found that ORs are the most constant quantities over patient types I usually just have to model exceptions, which is easily done using interactions in logistic models.

2 Likes

I think there is a gross misconception in this thread about the term “dependence on baseline risk” of an effect measure. In various threads I have called it either the latter or “non-portability across baseline risk”. I believe that this has been interpreted to imply that we mean the difference between the sample averaged treatment effect (SATE) and the target population average treatment effect (PATE). This is not the case at all.

We (I am assuming Frank as well) have no interest in sampling variability, internal validity bias or external validity bias in this thread and this is what has been assumed in many of the arguments that have been put forward and attempts to interpret this from a causal inference framework. While there are certainly implications for the causal inference community that is not at all the primary purpose of the paper that started this thread.

What is being argued here and what exactly is meant by “portability” in the paper that started this thread is that an effect measure is non-portable if the value of prevalence of the outcome in a study sample determines the magnitude of the SATE independent of the actual association between treatment and outcome. I note here that I use prevalence as that is what the paper notes is the important metric but since prev and baseline risk are related then this applies to baseline risk too.

To illustrate the difference this makes I take the same 140 thousand trials I scraped from Cochrane (with both a binary treatment variable and a binary outcome - 140,620 trials to be exact). I then computed the discrimination of outcome by treatment (would be the same for treatment by outcome as well) based on Stephen Walter’s equation and this then is our discrimination standard against which logRR and logOR can be compared across baseline prevalence and below is what we get:

There is little doubt that the magnitude of the RR depends on outcome prevalence and OR does not. By “depends on” what is meant here is that the degree of association (as judged by an index of discrimination) becomes secondary for the RR because its mathematical property demands that it also changes value when outcome prevalence changes.

Perphaps you have little doubt about this claim. But that is the entire problem: The claim is not true, yet you take it as an unquestionable article of faith.

Again, practical experience disagrees. It’s not of matter of one metric being always wrong or always right. It’s a matter of what is a good choice for the majority of applications.

3 Likes

The discussion I am having with Doi is about theory. He is making false theoretical claims, I am rebutting them.

If you think the disagreement can be settled by “practical experience” I am going to insist on your receipts. Did you analyze all those datasets using both the odds ratio and the switch relative risk? What criterion did you use to determine what was the “best choice” for each application?

Criterion for best choice: the metric inducing the fewest number of exceptions (interactions).

This graph from the 140,620 trials demonstrates more clearly the concepts stated previously. LnOR is linear with logit(AUC) and not so LnRR. This also explains a discussion on Andrew Gelman’s blog where he says that the coefficient on the logistic scale corresponds to different shifts in probability, depending on where your baseline is - which is true. The converse is also true that for the same shift in probability, the OR may be different depending on where your baseline is. For example:
0.2 –>> 0.4 = OR 2.67
0.4 –>> 0.6 = OR 2.25
0.6 –>> 0.8 = OR 2.67
This graph explains why - discrimination depends on where your baseline is for the same shift in probability.

1 Like

Very few people think about randomized trials in terms of “AUC”. If you are going to base your argument around this construct, could you at least please define the specific AUC that you are considering? I understand that you imagine using the outcome as a test for whether a patient was treated. But what specific ROC curve are you considering? How do you get more than one point on the curve? I don’t understand in what sense you can vary the positive test threshold value in your hypothetical diagnostic test, given that the outcome is binary.

Regardless of all this, as I have tried to tell you a couple of times: Unless you can establish that AUC is stable between groups with similar baseline risk, this is just a mildly noteworthy mathematical relationship, with no implications for whether any effect measure “depends on baseline risk”. For the argument to work, you first have to establish why you expect AUC to be stable across groups with different risk

2 Likes