Question arising from JAMA Guide article "Odds Ratios—Current Best Practice and Use"

In their paper “Odds Ratios – Current Best Practice and Use”, Norton, Dowd, and Maciejewksi argue that one of the lesser known limitations of the odds ratio from a logistic regression is that it is

…scaled by an arbitrary factor (equal to the square root of the variance of the unexplained part of binary outcome).4 This arbitrary scaling factor changes when more or better explanatory variables are added to the logistic regression model because the added variables explain more of the total variation and reduce the unexplained variance. Therefore, adding more independent explanatory variables to the model will increase the odds ratio of the variable of interest (eg, treatment) due to dividing by a smaller scaling factor.

The implication being

Different odds ratios from the same study cannot be compared when the statistical models that result in odds ratio estimates have different explanatory variables because each model has a different arbitrary scaling factor.4-6 Nor can the magnitude of the odds ratio from one study be compared with the magnitude of the odds ratio from another study, because different samples and different model specifications will have different arbitrary scaling factors. A further implication is that the magnitudes of odds ratios of a given association in multiple studies cannot be synthesized in a meta-analysis.4

(Reference 4 listed as #2 below.)

I was surprised by this given that I am taught to treat ORs as transportable. Inspired by all you fine folks, I looked for an example of a simple simulated logistic regression to check for myself.

In R

 sims = 100
 out <- data.frame(treat_1 = rep(NA, sims),
                   treat_2 = rep(NA, sims),
                   treat_3 = rep(NA, sims))
 n = 1000
 for(i in 1:sims){
 x1 = rbinom(n, 1, 0.5)           # Treatment variable
 x2 = rnorm(n)                       # Arbitrary continuous variable
 x3= rnorm(n)                        # Another arbitrary continuous variable
 z = 1 + 2*x1 + 3*x2 + 4*x3        # linear combination
 pr = 1/(1+exp(-z))         # pass through an inv-logit function
 y = rbinom(n,1,pr)      # bernoulli response variable
   #now feed it to glm:
   df = data.frame(y=y,x1=x1,x2=x2, x3 = x3)
   model1 <- glm(y ~ x1, data = df, family = "binomial")
   model2 <- glm(y ~ x1 + x2, data = df, family = "binomial")
   model3 <- glm(y ~ x1 + x2 + x3, data = df, family = "binomial")
  out$treat_1[i] <- model1$coefficients[[2]]
  out$treat_2[i] <- model2$coefficients[[2]]
  out$treat_3[i] <- model3$coefficients[[2]]

True enough, my column means were

  treat_1   treat_2   treat_3 
0.5980825 0.7585841 1.9987460 

So without any interaction between variables but all three being prognostic of outcome, I get three different results for my treatment effect.

So from this, my questions are:

  1. Have I made a mistake or misinterpreation that easily explains these results?
  2. I have been taught that when trialists provide adjusted odds ratios to use those for meta-analysis, but wouldn’t this imply that I would be extracting different odds ratios depending on the number of variables that were adjusted for?
  3. Is meta-analysis of observational trials using odds ratios entirely hopeless since, as stated by Norton et al: “different samples and different model specifications will have different arbitrary scaling factors”?


  1. Odds Ratios—Current Best Practice and Use
    EC Norton, BE Dowd, ML Maciejewski - JAMA, 2018
  2. Log odds and the interpretation of logit models
    EC Norton, BE Dowd - Health services research, 2018 - Wiley Online Library

hi @timdisher - fyi the link to the Norton, Dowd, and Maciejewksi paper you included is routed thru your university portal so won’t work for those without an account there :slight_smile:

You might find this interesting - although it is a bit over my head, I know Anders addresses problems with OR’s in his work: Effect modification and choice of effect measure

1 Like

Thank you! I will edit just to attach them.


Really nice work @timdisher. I think you are right, but the implications are a bit more subtle. If you were to think that unadjusted odds ratios were comparable across studies, this is not the case. That’s because these unadjusted ORs are functions of the (hidden to the analysis) subject characteristics.

I think the biggest mistake people tend to make in this area is criticizing adjusted estimates because you can never measure all the things you really need to adjust for. To that I say that we need to adjust for the most information that is available to us, given the absence of collider bias, looking into the future, etc.

It is true that for hazard ratios, odds ratios, etc. you can’t compare effect ratios across different sets of adjustors. The odds ratios have a fundamentally different meaning when adjustors change.


@timdisher, Thank you for drawing my attention to this article. I read Norton and Dowd’s earlier paper (your reference # 2) with a journal club last year and created a Shiny app to help illustrate the simulation they discuss on pp. 868–870. The only place I have used the app is in that journal club, so it doesn’t have any documentation, but I hope it might still be an interesting companion to that article: Shiny app.


Thank you, this is great!

Totally agree that unadjusted are no better (and almost certainly worse). My impression from these data are that efforts to synthesize data from observational and randomized trials is even harder than the current literature base makes it out to be. Not sure about the fields you work in most commonly, but everyone in my end of neonatology seems to have their own pet characteristics to adjust for. Could this be addressed at all by working from an assumed baseline risk and then converting ORs to RRs or absolute risk?

Seeing you say this so matter-of-factly juxtaposed against the all to common narrative of univariate adjustments followed by adding predictors to show how odds ratios change in response to “adjustment” is a weird combination of funny and frustrating. I wonder if you would mind expanding on the last sentence, is this a marginal vs conditional type scenario?

1 Like

This is covered in the ANCOVA chapter in BBR where I reference an excellent paper on identifiability problems of hazard ratios for the Cox model.

You can work from an assumed baseline risk if the risk is declared to represent a single subject.

Thank you for bringing this up.

Yes, ORs (and HR) change as we change the # and type of covariates.
What I find striking in this JAMA paper is their assertion about meta-analysis. I w’d think that at least a random-effects meta w’d be acceptable (assuming we have reasonable homogeneity, i.e. studies used similar target populations, adjusted for same covariates, etc). Thoughts?

1 Like

I had the same initial reaction. If not able to combine in meta-analysis under any scenario then how can they be transported anywhere e.g. using prediction models in clinical practice, developing economic models, etc…? The authors seem to suggest that really you can only rely on ORs for direction and stat sig, which seems like a strong statement. Off the read BBR!

1 Like

Keep in mind that if you adjust for 5 “big” predictors of outcome in one study and a different 5 predictors in another study, but the predicted risk that comes from one set is concordant with the predicted risk that comes from the other predictor set, you will have explained a large bulk of the easily explainable outcome variation, and done so in a way that makes the exposure odds ratios almost comparable.

1 Like

There is a very old paper by Gail et al that discusses numerical differences in effect measures between adjusted and unadjusted models for GLM like regressions.
This “bias” (which is not really bias in the sense of bias in estimation - just difference in the numerical values of adjusted and un-adjusted .models) underlines the non-collapsibility of ORs, which is the question indirectly asked here.

Because of this non-collapsibility , one cannot combine OR from models with different structures (different predictors). The situation of the HR is somewhat more interesting since in many cases the adjusted and the unadjusted ratios don’t differ by much based on extent of censoring.

The link to the paper

I show some of Gail’s key example in the ANCOVA chapter of BBR.

1 Like