Thanks for your prompt response and I would, of course, agree with you had we been discussing transportability (i.e. generalizability) of causal effects - however this is not what I am talking about in my post.
Mathematical portability (the influence of baseline prevalence on the ability of a computed effect to measure the magnitude of association correctly) does not need any consideration of random nor systematic error. It is just referring to whether the effect measure’s computed value can be shown to change for reasons other than the XY association (in this case the reason being investigated is baseline prevalence).
This concept does not need to be evaluated against real world data since I am not arguing for greater generalizability for OR’s in my post. The distinction you make between “by some criteria, the OR tends to be more portable than the RR”, and “…the OR is portable across baseline prevalence of an outcome…” suggests that you are referring to generalizability here and given the numerical similarity between OR and RR especially at smaller baseline prevalence I am quite sure that you are quite right about their equivalent transportability in the real world. I therefore fully agree with you that if we were to “put random positive numbers that sum to 1 into a series of 2x2 tables to create a hypothetical set of expected counts” we would certainly see “variation across the tables in both baseline prevalence and every measure, including the OR” but this is a different issue from its mathematical ability to measure what it is intended to measure.
There are many other areas (other than generalizability) where these mathematical properties would become important, but perhaps that can be left for a later discussion. Right now, I am seeking agreement on the mathematics outlined above and if there is no flaw to be pointed out, and there is broad agreement, then we can move on to these other issues.
As far as I can tell, all you’ve done with your math is reframe the long-known mathematical facts that the OR is variation independent of the designated baseline (reference) risk, while the RR isn’t (nor is the RD). You seem to call the OR result “portability”; I object to that usage because in ordinary English portability is a synonym for transportability, and there is already an unambiguous and established math term for what you describe: variation independence (VI), which is also called mathematical or logical independence (as opposed to statistical independence, which can occur with or without VI).
In another misuse of terms, you seem to confuse outcome incidence probability (risk) with outcome prevalence; with rare exceptions, trials examine risk, not prevalence.
I explained in my last post what VI does and doesn’t mean for practice. In particular, the fact that the OR is variation independent of baseline risk can be seen as a disadvantage in some contexts. Even when this VI is seen as an advantage because it eliminates a source of effect-measure heterogeneity, it does not address causally important sources such as variation in patient response.
I repeat from a much earlier post that the RR is variation independent of a parameter different from baseline risk, the log of the odds product or equivalently, the sum of the logits. This fact leads to a modeling and estimation approach which has some advantages over maximum-likelihood logistic regression; see Richardson, Robins & Wang, ‘On Modeling and Estimation for the Relative Risk and Risk Difference’, JASA 2017.
I will try to avoid posting back further on this issue as I have written out all the facts as I understand them, and have unrelated items to attend to. I hope others will take over responding to you, whether or not they share your or my view.
Thanks for your further thoughts on this matter. Regarding the specific points you raise, I used portability instead of variation independence (VI) because I thought it would be easier for clinicians to comprehend and a series of my papers have used this (even in the titles, not just the text) so probably need to be consistent. However point taken and for epidemiology readers will add ‘mathematical portability is also known as VI and does not imply generalizability or transportability as used in epidemiology’ in parentheses.
Regarding incidence or risk instead of prevalence – I use the term prevalence to refer to a dataset, and this can be from a trial or from other designs. At the end of follow-up we have a certain prevalence of the outcome in the entire dataset. The reason I have tried to avoid risk is that this is then confused with baseline risk (in the control group for example) but the variation or mathematical independence is actually linked to prevalence of the outcome in the dataset being analysed (or its subset) not to baseline risk. Of course baseline risk is associated with such prevalence so sometimes we loosely say independence from baseline risk but when the actual math is being discussed, I think its more accurate to use ‘prevalence in a dataset’ but point taken, will add ‘prevalence in a dataset’ to future discussions. I agree fully that VI adds unnecessary heterogeneity and confusion to the effect measure but of course has nothing to do with real world sources of confusion and heterogeneity such as variations in levels of care or heterogeneity in patient response.
I note that you have mentioned the generalized log odds product model previously and the log(odds product) is just logit(TPR)+logit(FPR) in my diagram above. It was made popular in the earlier days for summarizing ROC curves (now superseded by HSROC or SCS models). I really do not see why that should work better than predicted probabilities from logistic models – except when one is determined to only model the RR and nothing else. The authors of the generalized odds product model have reported a comparison of predicted probabilities using the Titanic data. Their dataset consisted of 1309 passengers from three passenger classes, of whom 809 lost their lives during the event. They removed the 263 (20:1%) passengers for whom age was missing, resulting in a sample size of 1046, including 284 (27:1%) passengers in the first class, 261 (25:0%) in the second class, and 501 (47:9%) in the third class. Predicted probability of death of the first passenger class (solid line), the second class (dotted line), and the third class (dashed line) with respect to different models are indicated in the figure below (red represents female, and blue represents male).
I do not find the differences in predicted probability by the GOP model as more sensible from first principles regarding the logical representation of the events that unfolded suggesting that there could be bias in inference through this model – this dataset is freely available and others can do the analysis (brm-package in R) and see for themselves what they think.
No worries, it will be interesting to see what others think as well so will leave this without asking further questions for 1-2 weeks.
Addendum: I could not replicate the Yin et al results and my analysis is below. Circles are class 1 F, diamonds are class 2 F, triangles are class 1 M, squares class 3 F. The other two on top are class 2 & 3 M.
(1) The switch relative risk is variation independent of the baseline risk, in the sense defined by Richardson, Wang and Robins (2017): The range of possible values of (SRR(v), P0(v)) is equal to the Cartesian product of the ranges of SRR(v) and P0(v).
(2) “Variation independence” is very closely related to what I have called “closure” of an effect measure. To recap, closure means that for any (conditional) baseline risk in [0,1] and for any possible value of the (conditional) effect measure, the implied (conditional) risk under treatment is also in [0,1]. It seems likely to me that closure is equivalent to variation independence, or at least that the definitions coincide for all standard effect measures.
(3) I will apologise in advance for making a point about semantics: While the term “variation independent” points to a mathematically reasonable referent, the term itself is in my view misnomer, and will give rise to misunderstandings, as is readily apparent in Suhail’s response.
To recap the recent history of this discussion: I wrote a commentary on the contributions of Mindel Sheps, in which I noted “closure” as a relevant consideration. Suhail responded with a letter to the editor where he argued that closure is not the main consideration, but that what actually matters is “baseline risk independence”. If we now agree that the concept Suhail is pointing towards is “variation independence”, that variation independence is essentially the same thing as closure, and that closure is a property of the switch relative risk: Can we then agree, to borrow a phrasing from earlier in this thread, that this brings closure to the issue?
Thanks Anders, I had you most in mind in asking others to comment…
Re your 2: Yes your “closure” is equivalent to VI for risk and measures of association of treatment with risk. VI is not a causal concept and it applies whether those measures are of effects or not; the math goes through without specifying an underlying causal model.
Re 3: You did not explain how “variation independent” is a misnomer. As far as I can see the term is unambiguous and I do not see how the term could be mistaken for something else. So how is VI a misnomer? I could see the term as sounding too technical, and another equivalent term like “range independence” might be clearer to most readers. Still, VI is precise and has an established usage over generations of math stat.
In contrast, “closure” already has established uses for other, unrelated concepts such as the combination of a region and its boundary, and a specific statement formed by binding all free variables in an expression. Thus, as with “consistency”, “closure” risks confusion should the different concepts with the same label arise in the same discussion. Also notable is how your comment ends by invoking yet another meaning of “closure”. So if you use “closure” in teaching I would advise cautioning that it has multiple technical as well as ordinary English meanings.
It seems like there is a general agreement that no modelling strategy dominates others in all cases. So while there may be reasonably strong prior (or expert) information that suggests the logistic model is a good default (ie. the opinion of @f2harrell ), it might not reflect all uncertainty that a skeptical audience might have (@AndersHuitfeldt).
Wouldn’t a principled way to decide this issue be based on model averaging or model selection techniques whether Bayesian or Frequentist? How would someone specify the data analysis plan for this methodology?
The following discusses the issue from a frequentist perspective, and includes other techniques (penalization). It cites some of the older Bayesian Model Averaging papers mentioned in the link to the Data Methods thread above.
Sylvain Arlot, Alain Celisse (2010). “A survey of cross-validation procedures for model selection,” Statistics Surveys, Statist. Surv. 4, 40-79
Here is another informative paper from a frequentist perspective.
Hansen, B. E., & Racine, J. S. (2012). Jackknife model averaging. Journal of Econometrics, 167(1), 38-46.
I resonate with all most all that you wrote Sander but don’t agree with this statement. This is much more of an advantage for ORs. But I’ll just hark back to my default strategy for 2023: (1) fit a model that has a hope of being simple, e.g., a logistic model with no interactions but allowing all potentially important variables to act nonlinearly, and (2) relax the additivity assumption in order to estimate covariate-specific ORs and to estimate the entire distribution of absolute risk reductions. Apply an optimum cross-validating penalty to all the interaction terms (e.g., optimize AIC using the effective degrees of freedom) or use a formal Bayesian model with shinkage priors for interactions. Getting much more experience doing this will shed more light on the adequacy of the logit link as a basis for this, i.e., how seldom do we find strong interactions on the logit scale.
I don’t see model selection or averaging as helping very much here. The data analysis plan might be an expansion of what I just wrote above.
I’d like to request that all future responses to this topic declare whether the new response is a theoretical consideration or a practical data analysis consideration. You’ll see that almost all of my replies are in the latter category.
Frank, you wrote that variation independence of the odds ratio from baseline risk “is much more of an advantage for ORs.” I am shocked that you would make such a sweeping generalization, especially when no such generalization can be correct. Whether the numeric properties being discussed are advantageous or not depends crucially on contextual properties like loss functions and what other information is available.
If the baseline risks are available and in hand, none of the measures has a compelling advantage if what we we want to be working with is the full risk function. If for some reason (often none other than someone forgot to extract or present baseline risks from research reports) I only get to have one of either the variation-independent OR or the very variation-dependent RD, I know I’d often want that to be the RD rather than the OR. For example, an OR of 9 for a very unpleasant side effect comes about regardless of whether the treatment pushes the risk from 1 per million to 9 per million or from 10% to 50%, whereas the RD would be 0.0008% in the first case and 40% in the second case. Depending on the treatment benefit, in the first case I may well accept treatment and the increased chance of the side effect but in the second case I may well reject treatment due to the increase. This kind of issue arises all the time in real clinical and public-health practice.
That is why for me your unnuanced claim looks more like numerology than statistical science, as has other OR promotions such as Doi’s. The bottom line as always is that purely numerical properties alone (such as VI, symmetry, etc.) are vastly inadequate for judging the relative utility and adequacy of candidate numerical procedures in real-world problems, because those determinations require context. Again, I’m surprised that you of all people would fall back to promoting numerical properties as if they dominate contextual considerations - a promotional problem that has plagued academic statistics (and spin-offs like econometrics) for generations.
I really find that what Frank said makes a lot of sense and ask the same question. If an effect measure model has one less reason for variability that is unrelated to the actual association under consideration then logically it is better unless it can be shown that it has other mathematical distortions that linear probability models do not have. That is not the case since the linear probability models:
a) are always misspecified
b) can give meaningless predictions
c) have boundary problems
none of which are seen with logistic models. Are we then suggesting that lack of robustness to misspecification of nuisance models:is a reason to dump the logistic model for such linear probability models? That seems to me the argument of Richardson, Robins & Wang proposing alternative “robust” models other than the logistic.
I think the best path forward is indeed what Frank suggests - lets stop posting theoretical considerations and give a practical data-analysis example. I can share the Titanic dataset if anyone is interested as I note above I cannot even reproduce the analysis from those proposing such “robust” linear probability models.
As before I’m describing tendencies and not universal truths. I stand by the assertion that relative odds is an excellent modeling choice because when one adds the terms into the model that allow for non-applicability of constant ORs (e.g., interactions with treatment) one can get away with larger penalties on these “departures from what we hope for” terms. Then one can focus on risk differences from this model.
From a theoretical POV, the ultimate goal of any analysis is to compute covariate adjusted effect measure distributions translated to the probability scale for clinicians to use on an individual basis. So I think there is agreement on this point.
From a data analysis POV: there are many ways to get to probability outputs; I do not think the proposal of the switch risk effect measure by @AndersHuitfeldt has gotten an entirely fair hearing, and is being confused with more common effect measures (ie. RR).
If have correctly interpreted his formulas, posts, and papers, he argues that mechanistic considerations can be used to mathematically derive a preferred effect measure that is variation independent, which has all of the parsimony benefits of the OR, but also preserves collapsibility.
These mechanistic considerations may also enable prediction and verification of treatment heterogeneity at sample sizes that logistic regression methods cannot detect.
From the POV of @f2harrell, he might be skeptical of the mechanistic assumptions and ask “How much of this result is reliant on the causal assumptions vs. the actual data?”
I proposed model averaging and model selection methods in a previous post to account for this uncertainty in the appropriate way to analyze the data.
I’d appreciate it if someone could point out any errors in my interpretation of the discussion between Frank, Anders, and Sander thus far.
I agree with it except don’t understand why preserving collapsibility should be part of the goal. I’d make the goal be the estimation of covariate-specific absolute risk reductions, which will most parsimonious using a base model that doesn’t try to provide collapsible primary parameters.
This paper from 2019 seems to assume the causal effect measure is being calculated from an observational study, where confounding is a concern.
We discuss two subtly different definitions of collapsibility, and show that by considering causal effect measures based on counterfactual variables (rather than measures of association based on observed variables) it is possible to separate out the component of non-collapsibility which is due to the mathematical properties of the effect measure, from the components that are due to structural bias such as confounding.
It is unclear to me if these considerations and reported benefits also apply to controlled trials.
I’m speculating that in a controlled trial scenario, mechanistic assumptions (if they hold) might permit detecting heterogeneity of treatment effect at smaller sample sizes with collapsible measures vs. logistic regression. This seems to be the point of contention.
Frank, I agree with your modeling comment as you just stated it. What shocked me before is that you said you didn’t agree with my statement that variation independence of baseline risks can be seen as a disadvantage in some contexts. I thus gave an example of such a context. Based on your response to that (“…then one can focus on risk differences”) I think perhaps you overlooked “in some contexts” (which referred to application context, not models) and upon re-reading what I actually wrote you wouldn’t disagree with it. If so, we don’t disagree on this matter or much of anything in practice, and certainly not about the math.
A difference may be my stronger emphasis on application context as opposed to mathematical properties. As I explained above, I think for choosing procedures there has been an overemphasis on mathematical properties as opposed to highly variable contextual features (such as goals, costs, and background information). The imbalance toward the math has harmed statistical science as well as some applications. I have elaborated these concerns repeatedly in print, most recently in sections 1-3, 8, 9, 11 and 12 of https://arxiv.org/abs/2304.01392; in press, Scandinavian Journal of Statistics.
This is essentially the crux of the issue - “preserving collapsibility” but I am unable to understand what the big deal is with this concept and why it should be preserved (for any reason). I think modellers (who are application focused) are coming round to accepting that when models are compared between groups that have different distributions of other causes of the binary response this is the expected behaviour of a good effect measure and therefore such concerns are usually misplaced. If we were to disagree then what is needed is a data-analysis example that demonstrates how and why this concern needs to be upheld in real life.
Causal non collapsibility is a property of the effect measure, this consideration is just as relevant in randomised trials as in observational studies.
In randomized trials, non collapsibility can be observed in its pure form, whereas in observational studies, it becomes difficult to know whether observed changes in the effect measure upon conditioning are due confounding or non collapsibility
The purpose of this paper was in part to formalize a definition that disentangles the component of “associational non-collapsibility” that applies in randomised trials (“causal non-collapsibility”) from the component that primarily arises in observational studies due to confounding.
Thank you , I will have to think more carefully about why I think people may be misled by the term variation independence.
The term closure was specifically chosen because of its pre-existing technical usage in abstract algebra/group theory. My coauthors and I will soon release a preprint where we show that there are advantages to framing regression models in terms of the abstract algebra properties of functions that input a distribution under one value of the predictor, and output a distribution under another value of the predictor. I will get back to this question once the preprint is released, hopefully things will be clearer at that stage
existence of a causal signal: the treatment is a cause of a nonzero expected benefit for a variety of patients (marginal view, magnitude varies of patient types), or
the treatment is the cause of a specific expected benefit for a specific type of patient (conditional view)