Should one derive risk difference from the odds ratio?

I will now come to the final point I would like to raise from the rebuttal (although there are others but I don’t think they are so important). This was a critique of our statement that “non-collapsibiity is only an issue when one gets the outcome model wrong by refusing to account for easily accounted for outcome heterogeneity”. Our arguments were deemed incorrect.
Lets take the simple example from Miettinen & Cook:

Z=0
Y
X 1 0
1 5 95
0 1 99
Z=1
Y
X 1 0
1 99 1
0 95 5

Here there is no confounding because X and Z are unassociated and OR(XY|Z) = 5.2 and OR(XY) = 1.2. There is no reason why the stratum specific ORs (5.2) should be expected to collapse on the marginal OR when Z is a far stronger influence on Y than is X. For a true measure of effect we expect therefore that the marginal effect will deviate from the conditional one based on the distribution of Z across treatment groups. When an effect measure is collapsible in the face of such data (e.g. RR), what other conclusion can be drawn?

I’m not sure what point you are trying to make here, and so again it just seems you are missing our points about why noncollapsibility is a severe objection to ORs as effect measures when the outcome is common in some groups (objections which go back 40 years). So to summarize once more:

  1. marginal causal ORs (mcORs) need not be any kind of average of the covariate-specific causal ORs (ccORs), and with common outcomes will often fall entirely outside the range of the ccORs. This is extreme noncollapsibility problem was first explained mathematically by the late Myra Samuels (Matching and design efficiency in epidemiological studies. Biometrika 1981;68:577-588) who recognized it was a consequence of Jensen’s inequality (see p. 580 ibid.).
  2. Claims that this favors the ccORs based on generalizability or portability miss the fact that there are always unmeasured strong outcome-predictive covariates and these will vary across studies and settings (not only in distribution but also in which ones remain uncontrolled), making the ccORs vary across settings as well - often more so than the mcORs.
  3. Maximum-likelihood estimates of ccORs are also more subject to small-sample and sparse-data bias than are mcORs (e.g., see Decreased susceptibility of marginal odds ratios to finite-s... : Epidemiology), both of which are more common problems than many realize.
  4. Marginal causal RRs and RDs do not suffer from this noncollapsibility problem, which means that (despite the inevitable presence of unmeasured outcome-predictive covariates, and unlike ORs) they will represent averages of the stratum-specific causal RRs and RDs. Additionally, their estimates are less impacted by small-sample/sparse-data bias.
  5. Causal hazard and person-time incidence-rate ratios are also noncollapsible in the same unfortunate qualitative way ORs are (as in 1 above), but quantitatively to a much smaller degree. The reason rate ratios are less impacted than ORs may become clear by noting that the ratio of treated vs. control person-time will usually fall between the ratio of number of patients starting and the ratio of number of patients surviving the trial.

Because of ongoing misunderstandings of noncollapsibility (as evidenced by your articles) the JCE invited me to write a two-part primer on the topic, which is now in press. It does not cite our exchanges as it is intended to lay out the facts of odds and OR noncollapsibiliy in simple numeric form. The intention is to enable those who read disputes to better comprehend the purely mathematical parts, about which there should be no dispute. Anyone interested in advance viewing can write me for copies.

The conflict is all about meaning and relevance for practice. As I have tried to explain in this thread, all measures have their particular problems and none should be promoted as universally superior; choices (if needed) should depend on context and goals. In particular, no measure should be trusted or promoted as “portable” or generalizable, and (as I have argued since the 1980s) every effort should be made to analyze and account for variation in risks and measures across studies and settings. If you can agree to at least that much perhaps we can achieve some sort of closure.

2 Likes

it is indeed all about meaning and relevance for practice, and that has been the sole focus of our papers – if you can bring better meaning and relevance to the table, I am all for it, but so far I remain unconvinced. After 40 years of debate on this topic, if there is yet another debate it seems to me that something radically different is called for. Blaming clinicians for having cognitive failures or abusing methods will not help change the status quo and neither will it help advance better decision making through better understanding of the subject. Granted that we have spent the better part of our lives in medical rounds, in the clinic or at the bedside, but I should remind methodologists that the main job of a medical practitioner is inferential decision making, which requires interpretation of the literature that these methods are used to create.
I think JCE has taken the right step to get these primers from you – perhaps these can shed some more light and will email you to request them.

4 Likes
  1. My take: You remain unconvinced because you are now committed in print to an untenable position. History indicates that’s an unrecoverable error and cognitive block for most anyone (at least if others point out the error); your responses indicate you are no exception. Case in point: In an earlier response to you, I did call for something radically different from promoting odds ratios or any single measure, and (as with most I’ve what I’ve written) you glossed over it: Summarize study information with flexible estimators of the survival distribution as a function of both treatment and covariates, without forcing strong parametric constraints, to better reflect the actual causal processes leading to the outcome and the uncertainties left by the study. Which leads to this point…
  2. I have endorsed at length the radical proposal to rebuild statistical theory around causal and information concepts rather than probability theory, especially because causality is closer both to the goals and intuitions of researchers and clinicians, especially for decisions; after all, decisions are made to have effects on patient outcomes. See for example the talks linked below, and this forthcoming book chapter: [2011.02677] The causal foundations of applied probability and statistics
    This position is nothing new, having been emergent in health & medical research methodology for going on 50 years (recognition of the defects in the odds ratio was an early example). In this century it has gained ground and received endorsement and elaboration from leading cognitive and AI researchers like Judea Pearl.
  3. I don’t and never have blamed “clinicians for having cognitive failures or abusing methods” as I believe that’s blaming the victims of poor teaching, advising, and reviewing (much like blaming patients for poor prescribing). Instead I blame all of us in authoritative positions for not facing up to the need to study, teach and address the unavoidable cognitive failures we all suffer as humans, and which warp our methodology as well our research reporting and synthesis. I especially condemn the failures of those who provide ill-considered (especially overly mathematical) statistical teaching and advice based on delusions of understanding theory, methods and practice in a particular specialty well enough to give sound recommendations.
    As with many other problems in statistics, these problems parallel well-documented problems in medical practice (those of physicians advising and practicing beyond their knowledge base, training specialization, and experience). Watch my 2-part talk on the problems here:
    part 1 (U Bristol): IEU Seminar: Sander Greenland, 24 May 2021: Advancing Statistics Reform, part I - YouTube
    part 2 (U Geneva): PartII_SGreenland_27052021 - Prof. Sander Greenland on Statistics Reform
2 Likes

Sander your post is informative although a bit harsh. But I want to comment on just one small piece:

Models that have odds ratios inside them (when Y is binary or ordinal) tend to be excellent bases for just what you are advocating, because odds ratios, in the many datasets I’ve analyzed, tend to be more constant than other parameterizations. Thus fewer nonlinear and especially fewer interaction terms are needed in the model. So the log odds scale is a convenient basis for doing exactly the kind of flexible modeling needed.

6 Likes

You know Frank I agree completely with your response and have said the same thing to colleagues who have misguidedly promoted use of log-linear risk or (worse) linear risk models. In fact I’ve been advocating our shared view on that since the 1970s (although, as I cited earlier, I have encountered exceptions in pair-matched cohorts in which log-linear risk models outperformed logistic models for a common outcome).

So perhaps you can imagine why after over 40 years I get exasperated when someone comes along and ignores all nuances and most literature on the topic to make vastly oversimplified and harmful claims, e.g., that we should be using ORs to summarize studies, ignoring all OR problems with a completely botched understanding of heterogeneity and noncollapsibility problems. This kind of toy one-method-fits-all approach to stats is the bane of science. It is as if someone doing a shift on a medical helpline advised every caller complaining of a bad headache to take aspirin (which would help some and kill others).

Allow me to explain the origin of my thinking: In the mid-1970s dissertation data involved a very common outcome (c-section, >50% in some groups but with huge variation) and many discrete clinical covariates in a large and very complete cohort database, which I analyzed using the then-new ML loglinear-count model software based on the then-state-of-the-art methods in Bishop Fienberg & Holland (1975) (as you know, loglinear count models include logistic models as a special case and log-odds/log-ORs are their natural parameters). The overall summary OR for the treatment of central interest, electronic fetal monitoring, was about 1 (null as can be).

But that concealed completely the most striking data feature: The titanic (and stable) OR variation across covariates, as expected from clinical input that some effects could reverse direction across other covariates, even when the “background risk” wasn’t varying much. Obstetricians could expect this because they were the mechanism of variation and could explain how they chose the outcome (c-section or vaginal delivery) based on the covariate values. Furthermore, one could see the extreme variation in effects across studies, as expected from the varying policy, practice and case mix across hospitals. [What a contrast to cancer epidemiology, where you can’t ask the body why it it chose to develop a tumor or not based on its exposures.]

So the lesson I learned from the start is that the model better be as rich as sustainable to allow for OR variation: Just because ORs vary less than RRs or RDs is no excuse for pretending they are anything like constant or transportable across studies or settings. And I learned that using p > 0.05 from a homogeneity test to ignore heterogeneity was inviting disaster, since the power of those tests for important variation is pathetic (in the same data one could also see that using p > 0.05 to ignore a confounder was biasing, especially since by definition of a confounder the target parameter is an adjusted one). From that experience I further learned the importance of reading the contextual literature, talking to the clinicians, and using that information to properly graph out the causal orderings of the variables under analysis.

I summarized those lessons in early articles such as
Greenland S. Limitations of the logistic analysis of epidemiologic data. Am J Epidemiol 1979;110:693-698
Greenland S, Neutra RR. Control of confounding in the assessment of medical technology. Int J Epidemiol 1980;9:361–367
Greenland S. Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med 1983;2:243–251
Those are dated, but still they promoted basic ideas of statistical modeling as smoothing and causal modeling as a crucial input for statistical model selection.

Of course in light of much data experience afterward along with extensive refinements of causal modeling in the 1980s (notably by Robins, Pearl, Rosenbaum etc.) my conceptual thinking evolved quite a bit. And subsequent computing advances allowed practical implementation of hierarchical/multilevel models (empirical-Bayes, semi-Bayes, penalized regression) to expand models while controlling estimation stability, as I reviewed in later articles including
Greenland S. Multilevel modeling and model averaging. Scand J Work Environ Health 1999;25 (suppl 4):43–48
Greenland S. When should epidemiologic regressions use random coefficients? Biometrics 2000;56:915–921
Greenland S. Principles of multilevel modelling. Int J Epidemiol 2000;29:158–167
Greenland S. Smoothing observational data: a philosophy and implementation for the health sciences. Int Stat Rev 2006;74:31–46

Back in the 20th century computing was costly (every analysis for my dissertation required tedious card punching and overnight mainframe runs using data on tape), while journals were expensive laboriously typeset physical items that had to impose strict word limits. These economic problems led to a focus on simple summarization and often to a distortive compression of results. Computing advances since then make it easy for reports to go beyond oversimplified data summaries (like ORs from exponentiated model coefficients) and computer typesetting has slashed article production costs; yet narrow presentation limits are still imposed. Nonetheless, online supplements allow thorough presentation not only of data but also of causal narratives about its generation and how those were accounted for in the statistical models. I suggest that will usually be a better approach for advising practice than compressing variation into one number with some interval around it that captures nothing but some hypothetical “random error”.

5 Likes

I love the history Sander. Bishop Feinberg Holland really takes me back, and reminds me also of SAS PROC CATMOD days. Another paper of yours that is relevant, and is one of my all-time favorite papers, is your 2000 paper that you mentioned, where you hit one out of the park in debunking the way most nutritional epidemiologic analyses are done – with residual confounding and multiplicity problems that arise mainly from choosing an oversimplified model and using a multi-step modeling procedure.

3 Likes

This seemingly looks like the discussion going back and forth on the same issues and perhaps suggests we have a problem with meaning here. To be clear, we do not mean by portability or transportability of an effect measure, the situation where the study sample is not a subset of the target population and that somehow the effect measure still applies to the target population. We understand that the goal of causal inference is to gain understanding of a particular target population based on study findings and perhaps to extend causal inferences beyond a study sample. To be clear, we have never spoken in our papers about transportability in terms of causal inference or what may happen to the effect size had this study been conducted in another, external population. That this is not what we are discussing nor what we try to demonstrate is clear for anyone who reads the paper.

We are talking solely about the mathematical portability aka homogeneity of an effect parameter given changes in r0 and that too from an unambiguous and clear math perspective. Nothing to do with generalization or transportability of causal effects, which perhaps is what you have been discussing here. What we say and show is what happens when you fail to prioritize the effect measure with mathematical portability and while it has implications for what you discuss above, this observation is consistent with what you have said before:
The assumption of constancy (homogeneity) of an effect parameter is statistically convenient but biologically stringent, and it is good practice to critically examine the assumption before applying a technique based on it. There are no purely logical or general biologic reasons for believing such an assumption, but in certain situations there are purely logical reasons for disbelieving constancy of the difference or ratio of proportions. These reasons arise from the inherent range limitations of these measures. In 2 X 2 table notation, the difference cannot exceed A/N1 or fall below —B/N0, and the ratio cannot exceed N0/B. For example, if the incidence proportion among the unexposed was known to range as high as 0.5 in some strata, the incidence proportion difference could not exceed 0.5 and the incidence-proportion ratio could not exceed 2.0 in those strata (since incidence proportions cannot exceed 1.0). If the incidence-proportion difference observed in other strata clearly exceed 0.5, one would have to rule out constancy of the difference; similarly, if the incidence-proportion ratio observed in other strata clearly exceed 2.0, one would have to rule out constancy of the proportion ratio. The odds ratio suffers from no such a priori range limitations, and so for common diseases the constant incidence-odds ratio assumption is logically less vulnerable to objection than are the other constancy assumptions. If, however, the disease is rare, the limit of the size of the incidence proportion difference and ratio will be so high as to cause no problems.

Frank, I agree.

Sander, this is a debate and controversy series in JCE and nothing in such series is unrecoverable – it just needs a more applied and less theoretical approach to allow readers to judge what they take from it. No one is trying to build their careers off such papers – if that was the case we would stick to conventional research and avoid controversy and debate. For the record, I requested the corresponding author in the group rebutting us to write the rebuttals and he agreed (and for this reason the first rebuttal was already on MedRxiv before our paper came out). Finally, the position taken, at least for some of our group, was one that existed for more than a decade and was not formulated because of the paper but rather because it was felt that something radically different was needed to change the status quo. Ultimately, to change or not is in the hands of the end-user of research and these papers only serve to facilitate such change, if they are written in a style that appeals to them.

Great, thanks Frank! BFH’s Discrete Multivariate Analysis was one of the main textbooks from my student days that I continued to use for decades after (along with Cox & Hinkley’s Theoretical Statistics and Leamer’s Specification Searches), and I got to work under each of the authors:

The monitoring project arose from a single Harvard teaching hospital with obstetrics headed by one tough researcher, Emanuel Friedman, who made sure the records were uniform and complete (the residents were reputedly quite obedient to him). Raymond Neutra was the epidemiologist and the late Steve Fienberg was the senior statistician. I was a predoc junior statistician on a gynecologic study using the new BMDP loglinear package when I joined the project and caught a mistake in Steve’s initial analysis of the mortality component of the study, that of dropping the strongest confounder (age) because it had p = 0.07 for its mortality coefficient (as shown in Fig. 2 of Greenland & Neutra 1980). He accepted the correction, which says a lot about him (especially for the time, when stepwise regression was still king), and just in time to fix the 1978 NEJM article on that component. It helped that Steve had put causal diagrams in his 1977 book, so he could see my point.

After that I finished up my dissertation and went to Harvard biostat where my superior heading the stat consulting group was the late Yvonne Bishop (briefly - as an LA native I loathed Boston weather and resigned after one term to return to UCLA forever after). Later still Paul Holland and I collaborated on a note about (get this) converting odds ratios to risk differences:
Greenland S, Holland PW. Estimating standardized risk differences from odds ratios. Biometrics 1991;47:319–322
which came about from my pointing out the bias in the conversion formula he had published a few years before (a bias analogous to the bias in the OR to RR conversion formula of Zhang & Yu JAMA 1998). As with Steve, he graciously accepted the correction and we went on to collaborate on the repair.

2 Likes

You know that like Frank I only came onto authorship in the rebuttal stage, where your rebuttal is titled
The OR is “portable” but not the RR: Time to do away with the log link in binomial regression.
If only you had not repeated the claim that “the OR is ‘portable’ but not the RR” there would have been much less reason to be concerned. But you chose to use the trigger word “portability”, which means to most everyone transportability.

To be sure, statisticians have a long and sorry record of misusing ordinary words as deceptive jargon for things that aren’t what the words imply (like “significance”, “confidence”, “severity”), as if they were shady appliance salesmen. So if you say transportability is not what you meant then own up to the mistake of using “portability” as you did. But from my reading of what you wrote in both articles, transportability is indeed what you meant, and I suspect others will see it that way too - so your claim needs to be rebuked in no uncertain terms.

But that’s not the only problem. As I have pointed out twice now in this thread, your discussion of noncollapsibility misses a central criticism of ORs detailed in 40 years of articles. And while I agree that log-risk regression is usually not a good idea, as I cited earlier there are exceptions.

2 Likes

We were just following the title of the rebuttal " Is OR “portable” in meta-analysis? Time to consider bivariate generalized linear mixed model." and certainly did not seem like the rebuttal was about any such thing but perhaps that explains why the focus on conditioning on “topic”. It did not occur to me till your last post that this was what was being rebutted, although I did hint earlier in the discussion that we perhaps mean different things. I went through both of our papers just now and there is no reason why such a conclusion could be inferred from either of them. We do define our usage of this term in the introduction and we reference your paper which I quoted above (so there is no way anyone can misinterpret our intent)

I have mentioned what we think about non-collapsibility of the OR in one of the posts above but will bring that up again after I go through your primers in detail.

1 Like

I am really enjoying this discussion. I hope that someone will draw a mind map to clarify the concepts and to help the rest of us wade through these issues. Take a look at mindmup.com. Or perhaps someone can create an outline of the discussion.

4 Likes

This is the example raised by Sander in the JCE primer (II) he kindly shared with me:

Male
Dead Alive risk odds OR RR
Treated 45 5 0.9 9 6 1.5
Untreated 30 20 0.6 1.5
Female
Dead Alive risk odds OR RR
Treated 30 70 0.3 0.43 3.86 3
Untreated 10 90 0.1 0.11
Collapsed
Dead Alive risk odds OR RR
Treated 45M+30F 5M + 70F 0.5 1 2.75 1.875
Untreated 30M + 10F 20M + 90F 0.27 0.36

Sanders comments in italics and mine in normal text:

  1. The collapsed odds ratio is closer to 1 than both the male and female odds ratios, and so cannot be any kind of average of the latter two
    -This is correct OR 6 in males, 3.86 in females and 2.75 overall
  2. It thus cannot represent confounding by sex because there is none, even though sex is a strong risk factor for death among these patients and is not affected by the treatment.
  • Agreed, there is no confounding by sex as sex and treatment are unassociated
  1. This qualitative difference between what is seen conditional on sex (within the subgroups) and marginally in the total cannot be ascribed to a sex imbalance between the treated and untreated, because there is none.
  • In the 100 females untreated and treated, risk went from 10% (0.11) to 30% (0.43) OR 3.86
  • In the 50 males untreated and 50 males treated risk went from 60% (1.5) to 90% (9) OR 6
  • In the 150 untreated with mixed sex and 150 treated with mixed sex the risk went from 27% (0.36) to 50% (1) on treatment (see female:male ratios in table) OR 2.75
  • All values in parentheses are odds
  • Keep in mind that male sex is a very strong influence on death and adding this to both treated/untreated brings the odds together – there is nothing odd about this.
  1. This leads to the apparent paradox that the treatment more than triples the odds of death in both the male and female subgroups, yet falls short of tripling the odds of death when considering the group as a whole.
    -There is no paradox – this is what we expect based on the data in 3. Above. Keep in mind that the OR is a likelihood ratio connecting baseline odds to posterior odds and because of the mix of a very strong risk factor for death with treatment the two odds must come closer together and thus the marginal OR drops. As we said in the paper “non-collapsibiity is only an issue when one gets the outcome model wrong by refusing to account for easily accounted for outcome heterogeneity”. The RR of 1.5 is due to the math issue (I better not mention the word “non-portable” although you know what I mean)
  2. This oddity of non-collapsibility without confounding has been the subject of some 40 years of discussion
  • In our view this is an expected property of a good effect measure. Note there is effect modification by sex in this case. The product term is >1 (as intuitively expected) on the OR scale and <1 on the RR scale and as Frank said earlier, the latter is just a keep the proportions between 0 & 1 product term.
2 Likes

Noncollapsibility of OR can occur without modification of OR or without modification of RR (though at least one of them will be modified). Thus as far as I’m concerned your final comment about modification is utterly irrelevant. And you have now repeated one of your central mistakes in claiming “non-collapsibility is only an issue when one gets the outcome model wrong by refusing to account for easily accounted for outcome heterogeneity”. In reality our models are always wrong and missing strong outcome predictors, so if the outcome is common then noncollapsibility will be a problem for predicting effects whether we know about the problematic unmeasured covariates or not. Your statement is thus as nonsensical as saying confounding or effect modification "is only a problem one gets the outcome model wrong by refusing to account for easily accounted for outcome heterogeneity”. No, that’s absurd: They are all problems whether we recognize them or not, because they are states of nature and uninfluenced by our wishful thinking or claims based on inadequate models.

Like your failure to stratify on topic, your latest comments continue to look to me like what I noted before: cognitive blockage - failing to see errors obvious to many others (I suspect because of the embarrassment that seeing the errors would cause). This sort of psychological phenomenon is nothing new - Fisher’s insistent defense of his “fiducial argument” is a classic example in statistics; in that case it took decades for others to offer detailed explanations of what he missed and provide plausible repairs to what was getting at. A more recent classic is the controversy over posterior predictive P-values, an invalid model-checking concept where repairs again took decades to appear and decades after their appearance are still not in common use.

In the present case we needn’t wait that long: The inspiration for and only valid point behind everything you’ve written in JCE and here is what Frank summarized above: “Models that have odds ratios inside them when Y is binary or ordinal tend to be excellent bases for [flexible modeling], because odds ratios…tend to be more constant than other parameterizations. Thus fewer nonlinear and especially fewer interaction terms are needed in the model” (which implies other modeling scales are worth avoiding, apart from exceptions based on further considerations). Even then, I gave a real example from my own work where a loglinear odds model needed every 3-way interaction to capture what was going on causally.

A simple repair to your claims would be to stop with Frank’s quote. As I’ve explained in detail, your recommendations beyond that are misleading or wrong because they are completely unmoored from causal and contextual realities; at best they only serve to illustrate how statistical reasoning can go off the rails when it is not derived from realistic contextual considerations and causal reasoning. Fortunately, after decades of hammering this point, its recognition seems to be growing; for example see the latest article detailing defects of OR:
https://onlinelibrary.wiley.com/doi/full/10.1002/bimj.202000202

2 Likes

I’d like to see a more pragmatic and less philosophical approach tried. We need a measure of non-additivity. If overfitting were not a problem such a measure might be the likelihood ratio \chi^2 statistic for an additive model divided by the \chi^2 for a model that includes all interactions with the exposure variable. Over a variety of datasets, outcomes, and exposures, and using different link functions, which basis results in the most parsimony, i.e., the highest ratio of \chi^2? This is similar to the graphs I made in the following blog article that shows the variation in OR, RR, and RD in a 40,000 patient dataset: https://www.fharrell.com/post/varyor. You can also use just one link function but model the effects in a very flexible way to estimate variation of OR, RR, RD as I did there.

3 Likes

Blockquote
I hope that someone will draw a mind map to clarify the concepts and to help the rest of us wade through these issues.

There is a wealth of information in this thread on how to approach complicated data analysis issues that I’m still trying to organize in my head.

I’d appreciate the correction if this supposition is wrong, but I see the key dispute between @Sander and @s_doi is that the Doi paper seemed to suggest (without being explicit) that the OR was (in a formal decision theoretic sense) a dominating effect size measure. This would imply that no other measure should be considered.

The rest of the exchange was @Sander attempting to demonstrate that this isn’t the case, with the most constructive (in the sense of mathematical logic) being reference to log-linear risk.

My question: What quantitative criteria could be used to compare the available effect size measures (and corresponding regression models), so the tools of decision theory can be applied?

Addendum: A link to some decent Wikipedia articles on important concepts in decision theory, which supplies the mathematical tools to evaluate answers very broad questions like the original post.

1 Like

My attempt at a unitless measure of non-additivity (on a logit or log link scale, or the original scale) was aimed at this.

1 Like

I had another question that might clarify some issues. It seems there is broad agreement (among you, Senn, and Sander) that OR is useful both as a summary statistic (for RCT derived data), or as an intermediate transformation to calculate a risk statistic for observational data.

Sander did give a specific example of observational, epidemiological data (cohort studies with sparse data and common outcome) where log-risk models were better than logistic models.

Blockquote
Sander: Yes, for the same reasons I was never a fan of log-risk regression either, except in some very special cases where (due to sparse data and resulting nonidentification of baseline log-odds) it could provide more efficient and useful risk assessments than logistic regression, and without the boundary and convergence problems it hits in ordinary ML fitting.

Do you have any intuition on whether logistic models could be modified to make them ‘admissible’ in this specific case, or the criteria (which could be specified before data collection) that would guide a researcher as to what method is preferable?

I take it that epidemiological studies have challenges that can’t simply be addressed as “increase your sample size”, so while it might not be the first choice in the large majority of cases, the log-risk model should be considered in a definable subset of them.

I agree - I am all for a more practical applied demonstration of the issues raised otherwise our agreement or disagreement will not have much real world impact