Should one derive risk difference from the odds ratio?

eagerbo · September 9, 2022, 10:04am

@AndersHuitfeldt, perhaps it is too late for a retraction, as you have already contacted the editor and the author seems unconvinced. Thus, you should submit a letter, where you explain your objections. This enables a future researcher to cite your letter when confronted with Doi et al eg in a review.
I my field, there is a faulty statistical paper (suggests a number of analyses based circular reasoning) published in a high ranging medical journal. It is really annoying and difficult to argue against doing these analyses as reviewers can be quite illiterate when it comes to statistics
Good luck – and don’t that this too serious (It is only work)

davidcnorrismd · September 9, 2022, 10:24am

Furthermore, a companion post on PubPeer would alert anyone (with the browser extension installed) that there is a comment on the paper.

AndersHuitfeldt · September 9, 2022, 11:02am

Thank you Esben. I have already sent a rapid response to BMJ Evidence-Based Medicine, they have not published it yet and have told me that they are analyzing the situation first.

I understand some of you think I am overreacting. I hope you can see my perspective, that this is about much more than “only work” to me. Just to make sure you understand why I am doing this, I am going to tell my story:

10 years ago, when I was a doctoral student at the Harvard School of Public Health, I had an idea that I thought was going to change the fundamentals of medical statistics and clinical decision making. It has since turned out that this idea has been rediscovered multiple times, as variations around the same theme, starting with Mindel C. Sheps in 1958. However, the idea is largely unknown among academics, and I contributed significantly to the theory by clarifying the underlying causal model and how this relates to work from other academic disciplines.

Very early after I completed my doctoral degree, I concluded that I have no interest in staying in academia unless I could work on theoretical methodological questions that arise from this idea. It seemed very clear to me that this touches on matters that are both important and of great interest to the academic community: If we had a clear answer for how to choose between effect measures, we would not only resolve the never-ending Twitter discussions between statisticians and epidemiologists/clinicians, we could also potentially resolve very significant theoretical disagreements between economists and statisticians about linear probability models. Moreover, this idea touches very directly on transportability/generalizability, which has become a central focus in methods research in the years since I started working on it (though with a dominant paradigm that in my view misses the point, by focusing on counterfactual distributions rather than effect measures).

With that background, I decided to very literally gamble my career on the correctness and importance of the idea. This gamble did not seem like a big deal at the time: I assumed that I would get a fair hearing, and that if there was a problem with the idea, the flaw would be spotted and I could leave academia with my head hung high and no hard feelings.

That is, unfortunately, not what happened. I have been ignored by almost everyone, and met with countless rejections from journals and conferences, without any attempt to identify a flaw in the argument. From my perspective, I am simply being blocked by hostile reviewers from having a platform for making the argument. I therefore end up constantly disappointing my employers with lack of research output/productivity, and having to move to a new country where someone is willing to give me a chance. Meanwhile, I am constantly being advised that if I want to stay in academia, I need to start working on something else. Which is probably accurate practical advice, but the conditional clause about wanting to stay in academia simply does not apply. After almost a decade of working on this, I am at very real risk of becoming permanently unemployable due to my obsession, despite holding both a medical degree and a doctoral degree.

I will never be at peace with giving up, unless someone identifies a critical flaw in my argument, or points to an alternative approach that works significantly better. All that I ask for, is that competent statisticians and epidemiologists evaluate my arguments. These scientists are, of course, in part acting rationally: They assign a low prior probability to my papers being significant, and they therefore have little incentive to put in the work that would be required to fully evaluate their merits.

This is the context I find myself in when people like Suhail Doi make explicit and public claims on scientific forums that my ideas are “wrong”, based on an incoherent paper that somehow sailed through peer review, making a mockery of the process that has held me back for so many years. Due to the credibility he holds by being a Professor, his public claims can be expected to even further lower the prior probability other statisticians assign to my work being worth engaging with.

I am therefore forced into finding a way to make it common knowledge that Suhail’s arguments do not work. Ideally, we would have a functioning market place of ideas where high status statisticians could back up my claims. For whatever reason, that is unfortunately not happening in this situation (probably because nobody reads papers anymore, nor do they care whether they are correct). My only recourse is to insist that BMJ Evidence-Based Medicine ask their statistics experts to evaluate it for retraction.

For people who do not know the full background, it may appear as if I am acting with unnecessary hostility. I want to be clear that this is not a reflection of my usual temperament, that I was put in a quite unusual and extreme psychological situation here, with much more at stake than “just work”.

f2harrell · September 9, 2022, 11:50am

My only advice is to not be tied to one thing. It’s fine to emphasize one thing but best to be attached to multiple ideas and to practice multiple methodologies.

AndersHuitfeldt · September 9, 2022, 11:54am

As I said above, I have received variations of this advice many times. Unfortunately, the utility function is not up for grabs: If changing the emphasis of my research to this extent is necessary, I would rather leave academia.

But before I give up, It is important for me that my work has been thoroughly evaluated. I can’t live with giving up, unless I can pinpoint what was wrong with the idea

davidcnorrismd · September 9, 2022, 12:12pm

In a NewsHour retrospective on Serena Williams, a sportswriter noted she “left it all on the court.” Thank you for your courage in doing the same here, Anders. I take it that the definitive expression of your hypothesis is here? https://arxiv.org/abs/2106.06316v5

I see I had marked up v1 extensively, but with intervening v3 & v4 both marked ‘Major update’ I am presumably far behind the current v5. I am glad to see James Scanlan dropped from your refs.

f2harrell · September 9, 2022, 2:18pm

That’s not what I’m suggesting. It’s fine to have an emphasis. But having multiple other areas that get a significant amount of your time, and learning a variety of tools is a good recipe.

s_doi · September 9, 2022, 8:15pm

This is going to be my last post on this thread and leave it with some suggestions for @AndersHuitfeldt

a) Being extremely rude and unprofessional in communication means lack of seriousness even if there was a point – in this case the point is also lacking which makes it worse
b) It does not help ones reputation or career to take the lazy route of making a litany of defamatory comments about a paper or beg editors to retract a paper to fulfill vested interests or personal views. Neither does it help to write a letter that contains nothing more than personal opinion and post to the editors
c) It only pays off to do the hard work of writing a counter-point based on scientific concepts that address the specific issues that may be thought need to be made and a key requirement is to be able to discuss disagreements logically and rationally. For examples see the three responses in the journal after my paper that started this thread.
d) As expected, the behavior in a) above leaves no room for scientific progress and the main responses thus far over the last 34 posts confirm this since they have just resulted in people taking advantage of this situation to engage in philosophical (or in some cases crude) swipes at each other

AndersHuitfeldt · September 9, 2022, 11:01pm

I don’t think there will be much to add to this discussion until some third party weighs in to resolve the actual methodological disagreement. My response to your accusations will depend very significantly on the final consensus about whether I was right to call for retraction of your paper.

In a hypothetical world where it turns out that I was wrong, I can assure you that an apology will be forthcoming and that I will withdraw from methods research. I do however think that this hypothetical is highly unlikely, and I urge you to start thinking about your course of action if I am proven right.

davidcnorrismd · September 10, 2022, 9:00am

This thread raises the spectre of a tweet from Vineet Tiruvadi:

If you start with the wrong framework then the ability to do complex analyses may seem like it's giving insight, but what you're mostly doing is studying how wrong your framework is #academictwitter #scitwitter #medtwitter pic.twitter.com/2Y6ZgQDtFL
— Vineet Tiruvadi (@vineettiruvadi) February 21, 2021

Perhaps this entire research programme has devolved into studying the search for a mythical One Effect Measure, losing touch with the real underlying problems of clinical decision-making and risk communication?

Anders, as heavy as the notational burden of your paper (v5) is, I would like to see it support formally placing your §2 “formaliz[ation] of how such individualization is done in practice” into the ideal context where “direct evidence for personalized risk under intervention” (§1.1) is not absent. Is the particular heuristic you adopt (applying an effect function g_\lambda : \mu_0 \rightarrow \mu_1) totally generic? Does it emerge formally as somehow ‘natural’? Or is it merely one of a much larger class of heuristics for which examples could be provided? I’m struck by the usefulness of categorical concepts to Maclaren & Nicholson’s presentation in this paper. Have you considered a categorical formulation of your ideas?

AndersHuitfeldt · September 10, 2022, 11:28am

I do not believe that my research program falls into this trap. I am very clear that effect heterogeneity is the default situation, and that homogeneity should only be invoked in special situations where this corresponds to reasonable beliefs about biological mechanisms.

It is not fully generic, in the sense that it is possible for two groups to have the same μ0 but different μ1. However, any kind of reasoning based on effect measure stability (for any parameter) can trivially be represented in this way, making it a very flexible and general representation of how findings from randomized trials are used in practice to inform individual level decision making (approaches based on risk difference, risk ratio, odds ratio etc are all special cases). Moreover, if an effect measure λ happens to be stable between groups, it is easy to prove that the approach based on its associated effect function gλ will lead to correct results when transporting results between those groups

I am unfamilar with categorical formulations, and will have to read up on it. As always, there are multiple mathematically equivalent ways to formulate the same ideas. I am always willing to try new formulations in order to communicate better with the statistical community (with a preference for simplicity whenever possible). So far, I have tried three equivalent formulations (counterfactual outcome state transition parameters based on counterfactual variables, causal pie models and modified causal DAGs), but I am always willing to try new formulations if this helps the reader. Ideally, I would need a coauthor who is familiar with such formulations to attempt this

f2harrell · September 10, 2022, 5:01pm

I am not going to make a judgment about this because I haven’t studied the issues deeply enough. But I am going to call a halt to ALL posts that even HINT at personal judgements. Everyone: stick to the science and stick to clear non-esoteric examples. I will delete any post that includes any form of attack on other person from this point on.

AndersHuitfeldt · September 14, 2022, 7:38am

My rapid response has now been posted at https://ebm.bmj.com/content/early/2022/08/10/bmjebm-2022-111979.responses

The editors asked me to shorten my response, which led to deletion of the paragraph about why I consider it “not even wrong”. Unfortunately, the title from the original version was retained, which may be confusing to readers who are not familiar with mathematical lore.

The letter was also edited to change my request to appoint a statistician to evaluate the paper for retraction, such that it now instead reads as a request for clarification. I want to be clear that I stand by my insistence that such evaluation is necessary.

The key claim in the paper is a purported proof that “the conventional interpretation of the risk ratio is in conflict with Bayes’ theorem.”. If this was truly proven, it should be very easy to find a statistician who has read and understood the paper, and who is willing to publicly stake their own credibility on the claim that this conjecture is true and proven. That is the implicit standard work flow of mathematical publishing: When a claim of a proof is published, it is assumed that there exist others are willing to publicly defend the correctness of the claim if necessary. Is there anyone who is willing to do that in this case?

davidcnorrismd · September 15, 2022, 7:11pm

Statisticians taking up your challenge:
shrubbery

AndersHuitfeldt · September 15, 2022, 7:39pm

This whole situation is sociologically interesting, perhaps revealing cultural differences between mathematical and medical scientific traditions. In mathematics, the scientific advance is almost always a theorem and its proof, meaning that the manuscript contains absolutely everything relevant to evaluating its correctness. In contrast, in medicine, so much is dependent on trusting the authors on their claims about the data. This leads to very different cultures

Doi et al have claimed a theorem and a proof, and got it published in a medical journal. I have questioned the soundness of their theorem, and publicly staked my reputation and my future in methods research on this. This is not an action I would take lightly. At the very least, I would have expected a number of statisticians to weigh in on the issue.

This manuscript is very clearly intended to be understood as a work of mathematics, in the sense that the contribution is fully deductive. It must be held to the standards of mathematics. When I cast doubt upon the published scientific record, one would think the community gives priority to resolving whether my accusation is correct. That cannot happen if everyone hides from the controversy

davidcnorrismd · September 16, 2022, 1:52am

Have you considered that maybe everyone is still burned out from the last time something like this happened?

AndersHuitfeldt · October 5, 2022, 11:12am

Suhail Doi has now posted the following rapid response at the journal website. I reproduce it here in its entirety:

The problem in evidence-based medicine arises when we port relative risks derived from one study to settings with different baseline risks. For example, a baseline risk of 0.2 and treated risk of 0.4 for an event in a trial gives a RR of 2 (0.4/0.2) and the complementary cRR of 0.75 (0.6/0.8). Thus the ratio of LRs (RR/cRR) is 2/0.75 = 2.67. If applied to a baseline risk of 0.5 the predicted risk under treatment with the RR “interpretation” is 1.0 but with the ratio of LRs “interpretation” is 0.73. Here, the interpretation of the risk ratio as a likelihood ratio, using Bayes’ theorem, clearly gives different results, and solves the problem of impossible risks as clearly depicted in the manuscript and the example.
If, in our effort to highlight the need of this correct interpretation, we have used strong wording that annoyed the commentator we feel the need to express regret. We hope that the commentator could also feel similarly for his scientifically unbecoming choice of wording that culminated with “Doi’s Conjecture”.

In this response, Suhail finally admits (at least implicitly) that he is trying to make claims about transportability/generalizability, not about “interpretation”. Moreover, he also states that the argument depends on the property which we have called “closure”. This admission is helpful, as I can trivially win any discussion against Suhail if we agree that the goal is transportability, and that his argument relies only on closure. I am not going to repeat these arguments here, as they have been stated repeatedly in this thread and in several papers and preprints.

The response still makes no attempt to clarify what he means by “interpretation”. Interestingly, this response appears to suggest that when “interpreting” the relative risk as a likelihood ratio, the investigator is constrained to only using it for transportability purposes in the form of the ratio of the standard RR and the complementary cRR, as (RR/cRR). This object is usually known as the odds ratio. No attempt is made to establish why the interpretation of the relative risk as a likelihood ratio forces the investigator to use the odds ratio for transportability purposes.

Suhail’s erroneous and unsubstantiated claim remains on record, as the key message of a published paper: that interpreting the relative risk as a relative risk (i.e. interpreting a spade as a spade) is “inconsistent with Bayes Theorem”. My position is unchanged: I consider retraction of this paper to be scientifically necessary.

Doi accuses me of scientifically unbecoming choice of wording, and expresses hope that I would express regret for this. I want to be clear that I regret nothing. I remind readers that Suhail Doi dragged me into this discussion, by tagging me publicly on this forum immediately after his paper was published in BMJ Evidence-Based Medicine, with a claim that this paper puts closure to our discussion about choice of effect measure. When I tried to exit the conversation, he dragged me back in with continued claims that his “math” proves unequivocally that no object derived from the complementary relative risk can ever have value as an effect measure (an implication of which is that my life’s work is worthless). What does he expect to happen - does he think I have some kind of ethical obligation to pussyfoot around the undeniable fact that his paper is incoherent and that its main message is an impossible and unjustified claim about Bayes Theorem?

This incident has caused me to lose a lot of respect for the integrity of the scientific publishing model and for academics working in medical statistics. The fact that this paper sailed through peer review is in itself reminiscent of the Sokal Affair. The facts that the journal doesn’t seem to care a single bit when I point out that the paper is incoherent, and that the academic community doesn’t care enough for anyone to go on record with their opinions, beggars belief. What is the point of having a formal system of peer review if we aren’t making a good faith attempt to verify the correctness of the scientific record?

AndersHuitfeldt · November 23, 2022, 12:30pm

My latest manuscript, “Mindel C. Sheps - Counted, dead or alive” will appear as a commentary in Epidemiology in 2023, half a century after Sheps’ death in 1973. This manuscript highlights Sheps’ important contributions to the discussion about choice between effect measures, and is also an attempt to simplify arguments I have made elsewhere in support of Sheps’ conclusions. The final author manuscript is now available as a preprint on arXiv, at [2211.10259] Mindel C. Cheps: Counted, dead or alive

s_doi · December 27, 2022, 1:44pm

This is my response to the commentary:

Response to Mindel C. Sheps: Counted, Dead or Alive

Dear Editor

It is true that when decision making in Medicine proceeds (e.g. drug A to prevent outcome Y), clinicians make use of research results that are reported in terms of estimated probabilities. Thus, Pr [Y(1) = 1] is the risk expected under the drug treatment while Pr [Y(0) = 1] is the estimated baseline risk under the control treatment. Thus for the particular baseline risk in the study (Pr [Y(0) = 1]) there is a relative risk (RR) given by RR = (Pr [Y(1) = 1])/( Pr [Y(0) = 1]). The problem with the common practice advocated to clinicians to combine the patient-specific baseline risk with the RR to estimate a patient’s risk under treatment is that the RR varies with every baseline risk and thus the estimated risk under treatment assuming a constant RR is not really that useful[1]. As stated by Huitfeldt[2], statisticians tend to appreciate this more because RR models may lead to predictions outside the range of valid probabilities or different predictions depending on if the RR or its complement (cRR) are used (e.g. if RR = RRdead then cRR= RRalive). However the latter are not the main limitations of the use of the RR in clinical practice, but rather the former is the critical issue and these have led to heated discussions on Frank Harrell’s blog with Huitfeldt.
To illustrate this, say Pr [Y(1) = 1] = 0.4 and Pr [Y(0) = 1] = 0.2 then the RR = 0.4/0.2 = 2 and cRR = 0.6/0.8 = 0.75. The common (and unquestioned practice) is that this RR can apply to any baseline risk other than 0.2) and thus, as stated by Huitfeldt, combining a patient-specific baseline risk with this RR is okay. We have highlighted a problem with this previously, first by noting that the RR is not independent of baseline risk[1] followed by a heated debate.[3-5] Given that the RR is a ratio of two probabilities and so too is the likelihood ratio (used in diagnostic studies), we have shown that the RR can also be interpreted as a likelihood ratio (LR)[6]. This interpretation holds when we consider a binary outcome to be a test of the treatment status in which case Pr [Y(1) = 1] = sensitivity while Pr [Y(0) = 1] = 1 – specificity and thus the RR = LR+. Similarly the cRR = LR− and LR+/ LR− is the odds ratio[6]. The difference emerges in use of the two interpretations of the same ratio at a different baseline risk. As a RR, for a baseline risk of 0.5, the estimated risk under treatment is now 1. As a ratio of LR’s, the estimated risk under treatment is (0.5/0.5×2/0.75)/(1+(0.5/0.5×2/0.75)) = 0.73. Of course, for the original baseline risk of 0.2 the estimated probability under treatment is the same with either interpretation. There is no difference in these interpretations whether we use the RR or cRR (aka survival ratio).
The simple solution proposed by Huitfeldt[2] (and attributed to Sheps) of using the survival ratio was also thought intuitive by us[7] at some point before we realized that this does not work because neither the RR nor cRR are independent of baseline risk1. This is exactly what Sheps said in her 1959 paper[8] - “Unfortunately the value of the [RR] has no predictable relation to the value of [cRR] …… and depends greatly on the magnitude of [baseline risk]”. Sheps approach therefore does nothing to resolve any of the theoretical problems with the RR or its complement. While Huitfeldt goes on to talk about 1−RR and 1−cRR in the light of Sheps paper, I will ignore this because what is clear is that if the RR and cRR are dependent on baseline risk then these interpretations do not hold when the RR or cRR are interpreted outside of the populations used to derive these measures. This follows logically from the fact that these measures, contrary to Huitfeldt, must be different in different groups defined by baseline risk irrespective of any assumptions about biological mechanisms.
Next Huitfeldt suggests that unmeasured individual-level attributes that determine whether a person responds to treatment may be called “switches”. He suggests that one or more switches need to be present in a person for a treatment to have an effect and they combine to determine whether and how treatment affects the outcome. However none of these attributes are subsequently mentioned and instead he outlines four logically possible ways that such switch patterns can operate based on the attributes of the treatment rather than any individual-level patient attributes. It is unclear therefore a) why he concludes that cRR will be stable when treatment is a sufficient cause of the outcome, b) what he means by switch pattern or c) why it should be independent of baseline risk. The same applies to the rest of Table 1. My impression of what is being done here is that Huitfeldt would like us to ignore the mathematical properties of the ratio and instead believe that some esoteric biological mechanism must be considered to be at play that serves to make the ratio independent from baseline risk. The implication therefore is that non-independence (from baseline risk) must not be faulted on the ratios mathematical properties but on the user who does not understand how biology works. This, of course is all contrary to what Sheps proposed[8].
I will conclude by saying that the RR and cRR are best interpreted as likelihood ratios and therefore need to be combined for their use as effect measures. The ratio RR/cRR = odds ratio and the latter itself is a likelihood ratio of a different type that connects risk under no-treatment to risk under treatment[6].

Reference List

1. 	Doi SA, Furuya-Kanamori L, Xu C, Lin L, Chivese T, Thalib L.  Controversy and Debate: Questionable utility of the relative risk in clinical research: Paper 1: A call for change to practice.  J Clin Epidemiol 2022;  142:271-79.
2. 	Huitfeldt A.  Mindel C. Sheps: Counted, Dead or Alive.  Epidemiology 2023; 
3. 	Doi SA, Furuya-Kanamori L, Xu C, Chivese T, Lin L, Musa OAH, et al.   The Odds Ratio is "portable" across baseline risk but not the Relative Risk: Time  to do away with the log link in binomial regression.  J Clin Epidemiol 2022;  142:288-293.
4. 	Xiao M, Chu H, Cole SR, Chen Y, MacLehose RF, Richardson DB, et al.   Controversy and Debate : Questionable utility of the relative risk in clinical  research: Paper 4 :Odds Ratios are far from "portable" - A call to use realistic  models for effect variation in meta-analysis.  J Clin Epidemiol 2022;  142:294-304.
5. 	Xiao M, Chen Y, Cole SR, MacLehose RF, Richardson DB, Chu H.  Controversy and Debate: Questionable utility of the relative risk in clinical  research: Paper 2: Is the Odds Ratio "portable" in meta-analysis? Time to  consider bivariate generalized linear mixed model.  J Clin Epidemiol 2022;  142:280-287.
6. 	Doi SAR, Kostoulas P, Glasziou P.  Likelihood ratio interpretation of the relative risk.  BMJ Evid Based Med 2022; 
7. 	Furuya-Kanamori L, Doi SA.  The outcome with higher baseline risk should be selected for relative risk in  clinical studies: a proposal for change to practice.  J Clin Epidemiol 2014;  67:364-7.
8. 	Sheps MC.  An examination of some methods of comparing several rates or proportions.  Biometrics 1959;  15:(1)87-97. 0006-341X.

AndersHuitfeldt · December 27, 2022, 6:56pm

It is true that when decision making in Medicine proceeds (e.g. drug A to prevent outcome Y), clinicians make use of research results that are reported in terms of estimated probabilities. Thus, Pr [Y(1) = 1] is the risk expected under the drug treatment while Pr [Y(0) = 1] is the estimated baseline risk under the control treatment. Thus for the particular baseline risk in the study (Pr [Y(0) = 1]) there is a relative risk (RR) given by RR = (Pr [Y(1) = 1])/( Pr [Y(0) = 1]). The problem with the common practice advocated to clinicians to combine the patient-specific baseline risk with the RR to estimate a patient’s risk under treatment is that the RR varies with every baseline risk and thus the estimated risk under treatment assuming a constant RR is not really that useful[1]. As stated by Huitfeldt[2], statisticians tend to appreciate this more because RR models may lead to predictions outside the range of valid probabilities or different predictions depending on if the RR or its complement (cRR) are used (e.g. if RR = RRdead then cRR= RRalive). However the latter are not the main limitations of the use of the RR in clinical practice, but rather the former is the critical issue and these have led to heated discussions on Frank Harrell’s blog with Huitfeldt.

I interpret this paragraph to state that you consider baseline risk invariance to be the crucial consideration for choice of effect measure. I want to ask whether you agree or disagree with any of these statements:

Regardless of which scale is used to measure the effect, it is always possible for the effect of treatment to vary between groups that have different baseline risk. This is true for OR, RR, cRR, RD, etc.
Therefore, “invariance to baseline risk” is not a mathematical property of any effect measure. Trying to prove that an effect measure is mathematically known to be invariant to baseline risk (using mathematical logic alone), will always be futile. Given that mathematically guaranteed invariance to baseline risk is impossible, this is not a useable when choosing between effect measures.
It is true that for some effect measures, invariance to baseline risk is impossible. For example, if RR among women is 2, and the baseline risk among men is 0.6, then it is not possible for the RR to take the same value in men and women. The phenomenon where invariance to baseline risk is sometimes impossible, corresponds exactly to what we call “non-closure”.
The odds ratio and the switch relative risk are both closed

Given that the RR is a ratio of two probabilities and so too is the likelihood ratio (used in diagnostic studies), we have shown that the RR can also be interpreted as a likelihood ratio (LR)[6].

This is uncontested, but I am genuinely puzzled about why you keep bringing up this trivial result, which is not relevant to the discussion.

This interpretation holds when we consider a binary outcome to be a test of the treatment status

Why do you think it is useful to consider a binary outcome as a test of treatment status? Do you think it is clinically useful to determine whether a patient has a disease, for the purpose of inferring probabilistically whether he is likely to have been treated? Or is there some other rationale behind this?

The simple solution proposed by Huitfeldt[2] (and attributed to Sheps) of using the survival ratio was also thought intuitive by us[7] at some point before we realized that this does not work because neither the RR nor cRR are independent of baseline risk1.

If you require that an effect measure can only be useful to the extent that there is a mathematical guarantee of invariance to baseline risk, then no effect measure will meet your requirements, not even your cherished OR. If this is the case, I would sincerely suggest you either go fully non-parametric or alternatively conclude that statistical inference is an impossibility.

There is no point in continuing this conversation if you keep on requiring that the switch relative risk satisfies an impossible requirement, one that is not even satisfied by your preferred effect measure.

This is exactly what Sheps said in her 1959 paper[8] - “Unfortunately the value of the [RR] has no predictable relation to the value of [cRR] …… and depends greatly on the magnitude of [baseline risk]”. Sheps approach therefore does nothing to resolve any of the theoretical problems with the RR or its complement.

First of all, if you want to have a good faith discussion about this, please stop with the selective editing immediately. What this part of Sheps’s paper establishes, is that there is no one-to-one relationship between RR and cRR. Further, that ifyou have a constant RR, then cRR is greatly dependent on baseline risk. Similarly, if you have a constant cRR then RR depends on baseline risk. This is a relational statement about what we can infer about one effect measure if another effect measure is constant.

In fact, for any two non-equivalent effect measures, it will be true that if a (non-null) effect is stable on one scale, then the effect on the other scale will depend on the baseline risk: I can write RR as a function of RR, and it will have a term for baseline risk. But similarly, I can write OR as a function of RR, and it will also have a term for baseline risk. This relative baseline dependence is symmetric, and gives no reason to prefer one effect measure over the other. This line of reasoning is only interesting if you have some reason to assign priority to one of the scales, based on some kind of background knowledge that the effect is stable on that scale.

This follows logically from the fact that these measures, contrary to Huitfeldt, must be different in different groups defined by baseline risk irrespective of any assumptions about biological mechanisms.

Consider a high quality large randomized trial, which finds that the relative risk (for the effectiveness outcome) is is equal between men and women, who have different baseline risks. Are you saying this is theoretically impossible? If I find a study in NEJM or Lancet where there is no such difference, does it disprove your theoretical argument? If not, what does your theoretical argument even mean? Does it rule out any possible future observations?

My impression of what is being done here is that Huitfeldt would like us to ignore the mathematical properties of the ratio and instead believe that some esoteric biological mechanism must be considered to be at play that serves to make the ratio independent from baseline risk. The implication therefore is that non-independence (from baseline risk) must not be faulted on the ratios mathematical properties but on the user who does not understand how biology works. This, of course is all contrary to what Sheps proposed[8].

I am not asking you to ignore any “mathematical properties”. I am telling you that what you are asking for (“mathematically guaranteed baseline risk invariance”) is not a criterion that can be met by any effect measure, and therefore a red herring. Moreover, I am telling you that given the best insights we have from toxicology about how to model mechanism of action, and given the best insights we have from psychology and philosophy about how to generalize causal effects, the most rational choice is often to start from a simplified biological (rather than mathematical) model that implies stability of the switch relative risk, and then think about all the possible reasons that this biolological model could go wrong:

Is switch prevalence correlated with baseline risk?
Does switch prevalence differ between segments of the population?
Does the drug have non-monotonic effects?

Depending on our views about the plausibility of these threats to validity, we can then make informed choices about whether there is any point at all in going forward with the analysis, and if so, whether there is a need for interaction terms or subgroup analysis, whether we can get point identification or have to settle for partial identification/bounds, whether we need a sensitivity analysis, etc.

I will conclude by saying that the RR and cRR are best interpreted as likelihood ratios and therefore need to be combined for their use as effect measures. The ratio RR/cRR = odds ratio and the latter itself is a likelihood ratio of a different type that connects risk under no-treatment to risk under treatment[6].

This is a complete non sequitur. You have still maybe absolutely no attempt to explain why “interpretation” matters when the actual math is invariant to interpretation, nor why the interpretation as a likelihood ratio prohibits its use as an effect measure, or why the odds ratio is required in order to “connect” risk under treatment to risk under no treatment.