Ordinal OR as a sort of "average OR" across all thresholds - surprising result

Hi all,

Happy thanksgiving to those celebrating this weekend. I am doing some work extending/adapting @f2harrell ImpactPO approach to network meta-analysis or ordinal outcomes. As part of this workflow I am visualizing treatment effects on thresholds of an ordinal outcome + the OR from a PO model for each trial in the meta-analysis. I’ve generally thought of the OR from a PO model as a sort of average OR but have been surprised to find that in some studies the estimate of the PO OR can lie entirely outside of ORs from the individual thresholds. I’ve attached a 1 study reprex where this occurs below (ordinal risankizumab lor of 4.2 is greater than either pasi75 4.1 or pasi90 4.08). The pasi75/pasi90 outcomes are generated from the categorical version of the outcome, so I’m pretty sure it’s not just an issue with the data but would be grateful if anyone could help me realize a mistake a making. I haven’t been able to find anything talking about this on a quick scan of the literature so had a few questions to the more technically minded in our group:

  1. Is this a known phenomenon and if so is there an intuitive explanation? Is it related to non collapsibility at all?
  2. I’ve noticed it generally doesn’t matter when converting back to absolute scales in my case, but is there a good way to explain this to people who are already skeptical of using PO models? The deviations are usually small when it does happen, but for treatments with extreme treatment effects (lor > 7) it can start to look like very large differences on the OR scale (but then <0.1% differences on risk difference).
  3. @f2harrell I noticed in ImpactPO you use multinomial regression when relaxing PO on all variables at once. I noticed sometimes when fitting this same model with VGAM and parallel = FALSE for the treatment variable that I get a solution that converges but to a very bizarre solution where both the intercepts and treatment effects are not at all similar to running models on the individual thresholds separately. Do you know why this would be?

Data file

Example code and results

library(dplyr)
library(rms)
dat <- readxl::read_xlsx(here::here("ult1.xlsx"))

dat %>%
  dplyr::group_by(trt) %>%
  dplyr::summarise(p75 = mean(pasi75),
                   p90 = mean(pasi90),
                   n = dplyr::n())
#> # A tibble: 3 × 4
#>   trt             p75    p90     n
#>   <fct>         <dbl>  <dbl> <int>
#> 1 Placebo      0.0980 0.0490   102
#> 2 Risankizumab 0.868  0.753    304
#> 3 Ustekinumab  0.7    0.420    100

p75 <- glm(pasi75 ~ trt, family = binomial, data = dat)
p90 <- glm(pasi90 ~ trt, family = binomial, data = dat)
ord <- lrm(out ~ trt, data = dat)


p75
#> 
#> Call:  glm(formula = pasi75 ~ trt, family = binomial, data = dat)
#> 
#> Coefficients:
#>     (Intercept)  trtRisankizumab   trtUstekinumab  
#>          -2.219            4.106            3.067  
#> 
#> Degrees of Freedom: 505 Total (i.e. Null);  503 Residual
#> Null Deviance:       634.5 
#> Residual Deviance: 424.3     AIC: 430.3
p90
#> 
#> Call:  glm(formula = pasi90 ~ trt, family = binomial, data = dat)
#> 
#> Coefficients:
#>     (Intercept)  trtRisankizumab   trtUstekinumab  
#>          -2.965            4.082            2.642  
#> 
#> Degrees of Freedom: 505 Total (i.e. Null);  503 Residual
#> Null Deviance:       697.3 
#> Residual Deviance: 515.6     AIC: 521.6
ord
#> Logistic Regression Model
#> 
#> lrm(formula = out ~ trt, data = dat)
#> 
#>                        Model Likelihood      Discrimination    Rank Discrim.    
#>                              Ratio Test             Indexes          Indexes    
#> Obs           506    LR chi2     234.27      R2       0.434    C       0.776    
#>  PASI < 75    162    d.f.             2      R2(2,506)0.368    Dxy     0.552    
#>  PASI 75 - 89  68    Pr(> chi2) <0.0001    R2(2,406.1)0.436    gamma   0.796    
#>  PASI 90-100  276                            Brier    0.128    tau-a   0.322    
#> max |deriv| 3e-12                                                               
#> 
#>                  Coef    S.E.   Wald Z Pr(>|Z|)
#> y>=PASI 75 - 89  -2.2116 0.3327 -6.65  <0.0001 
#> y>=PASI 90-100   -3.1152 0.3472 -8.97  <0.0001 
#> trt=Risankizumab  4.2124 0.3679 11.45  <0.0001 
#> trt=Ustekinumab   2.9048 0.3831  7.58  <0.0001

This doesn’t make sense to me, but at any rate the overall OR is going to be tightly related to the overall Wilcoxon test statistic, so you might explore Wilcoxon vs. various cutpoints also. It may be helpful to simulate a dataset that we can all play around with —- one that reproduces the problem.

Thanks, glad to hear it seems odd. I’ll try to replicate it based on some simulation code using that trial as the source of probabilities but if you did wan’t to look at things in the meantime the Data file link above should hopefully take you to a onedrive with the xlsx of the data from this study.

1 Like

Added some simulation code below based on the observed results from that trial. Apologies that the code is not so elegant but hopefully overall implementation looks correct. When I simulate with large sample sizes I’m able to consistently recover all the right TEs so I think it should be right. I see the same issue in this simulation when generating data under PO and it generally only seems to result in big differences when the estimated treatment effects are extreme, and even then an |ROR| of 1.11 is the max I’m seeing consistently. Interested to hear your thoughts re: whether I’ve made an error here or if you think it’s just a feature of noise under this particular set of parameters.

Simulation Code

library(dplyr)
library(rms)

sim.ord <- function(PO = TRUE, n = 506){

 # Initialize PBO PASI 75/90 probabilities based on observed data
  int <- c(0.0980, 0.0490)


  if(PO){
    # Simulate from PO based on LRM fit to example data
    ris <- plogis(qlogis(int) + 4.2)
    ust <- plogis(qlogis(int) + 2.9)
  } else{
    # Simulate from non PO based on binomial fits to example data
    ris <- plogis(qlogis(int) + c(4.106, 4.082))
    ust <- plogis(qlogis(int) + c(3.067, 4.106))
  }


  # Create multinomial categories
  # PASI 90 is just the last value
  # PASI 75-90 is PASI 75 - PASI 90
  # PASI < 75 is 1 - PASI75

  int.p <- c(1 - int[[1]], int[[1]] - int[[2]], int[[2]])

  ris.p <-  c(1 - ris[[1]], ris[[1]] - ris[[2]], ris[[2]])
  ust.p <- c(1 - ust[[1]], ust[[1]] - ust[[2]], ust[[2]])

  # Create data for each arm based on the observed randomization ratio
  pbo <- t(rmultinom(ceiling(n*102/506), 1, int.p)) %>%
    tibble::as_tibble() %>%
    `colnames<-`(c("< 75", "75 - 89", "90 - 100")) %>%
    dplyr::mutate(trt = "PBO") %>%
    tidyr::pivot_longer(cols = -c(trt)) %>%
    dplyr::filter(value == 1)

  ris <- t(rmultinom(ceiling(n*304/506), 1, ris.p))  %>%
    tibble::as_tibble() %>%
    `colnames<-`(c("< 75", "75 - 89", "90 - 100")) %>%
    dplyr::mutate(trt = "RIS") %>%
    tidyr::pivot_longer(cols = -c(trt)) %>%
    dplyr::filter(value == 1)

  ust <- t(rmultinom(ceiling(n*100/506), 1, ust.p))  %>%
    tibble::as_tibble() %>%
    `colnames<-`(c("< 75", "75 - 89", "90 - 100")) %>%
    dplyr::mutate(trt = "UST") %>%
    tidyr::pivot_longer(cols = -c(trt)) %>%
    dplyr::filter(value == 1)

  # Bind arms together, calculate PASI75/90 and then turn name into an ordered
  # factor for ordinal model
  dat <- dplyr::bind_rows(pbo, ris, ust) %>%
    dplyr::mutate(pasi75 = as.numeric(!name %in% c("< 75")),
                  pasi90 = as.numeric(name == "90 - 100"),
                  name = factor(name,levels = c("< 75", "75 - 89", "90 - 100"), ordered = TRUE))


  # Fir indivuidual binomials and compare to lrm
  m75 <- glm(pasi75 ~ trt, family = binomial, data = dat)
  m90 <- glm(pasi90 ~ trt, family = binomial, data = dat)
  po <- lrm(name ~ trt, data = dat)

  check <- bind_rows(coef(m75)[-1],
            coef(m90)[-1],
            coef(po)[-c(1,2)]) %>%
    dplyr::mutate(model = c("bin75", "bin90", "po")) %>%
    tidyr::pivot_longer(cols = -model) %>% tidyr::drop_na() %>%
    dplyr::mutate(name = stringr::str_remove(name,"=")) %>%
    tidyr::pivot_wider(names_from = model, values_from = value) %>%
    dplyr::mutate(
      dplyr::across(-name, ~ round(.x, 2)),
      po_in_range = po >= pmin(bin75, bin90) & po <= pmax(bin75, bin90),
      # Calculate how far out of range 'po' is:
      po_distance = dplyr::case_when(
        po_in_range ~ 0,  # If within range, no distance
        po < pmin(bin75, bin90) ~ po - pmin(bin75, bin90),
        po > pmax(bin75, bin90) ~ po - pmax(bin75, bin90)
      )) %>%
    tidyr::pivot_wider(names_from = name, values_from = c(-name))

  ris.check <- check %>% dplyr::select(dplyr::matches("RIS"))
  ust.check <- check %>% dplyr::select(dplyr::matches("UST"))

  dplyr::lst(dat, m75, m90, po, ris.check, ust.check)

  }

# Original n = 506
# Run simulations
sim.res <- purrr::map(1:100, ~ sim.ord(PO = TRUE, n = 506))

# Pull out ris and ust check
check.po.ris <- purrr::map_df(sim.res, "ris.check")
check.po.ust <- purrr::map_df(sim.res, "ust.check")

# Summarize ORs where there are differences
check.po.ris %>% dplyr::filter(po_in_range_trtRIS == FALSE) %>%
  dplyr::mutate(po_distance_trtRIS = abs(po_distance_trtRIS)) %>%
  dplyr::mutate_if(is.numeric,~ round(exp(.x), 2)) %>%
  dplyr::arrange(-po_distance_trtRIS)

#Assess overall estimates
colMeans(check.po.ris)

Example results

# Summarize ORs where there are differences
check.po.ris %>% dplyr::filter(po_in_range_trtRIS == FALSE) %>%
  dplyr::mutate(po_distance_trtRIS = abs(po_distance_trtRIS)) %>%
  dplyr::mutate_if(is.numeric,~ round(exp(.x), 2)) %>%
  dplyr::arrange(-po_distance_trtRIS)
#> # A tibble: 26 × 5
#>    bin75_trtRIS bin90_trtRIS po_trtRIS po_in_range_trtRIS po_distance_trtRIS
#>           <dbl>        <dbl>     <dbl> <lgl>                           <dbl>
#>  1        141.         167.      126.  FALSE                            1.12
#>  2        226.         469.      206.  FALSE                            1.09
#>  3         91.8         95.6      84.8 FALSE                            1.08
#>  4         62.8         99.5      58.6 FALSE                            1.07
#>  5         56.3         58.0      62.2 FALSE                            1.07
#>  6        140.         183.      132.  FALSE                            1.06
#>  7        114.          80.6     122.  FALSE                            1.06
#>  8         94.6         91.8      87.4 FALSE                            1.05
#>  9         87.4         91.8      83.9 FALSE                            1.04
#> 10         73.7         64.7      76.7 FALSE                            1.04
#> # ℹ 16 more rows

#Assess overall estimates
colMeans(check.po.ris)
#>       bin75_trtRIS       bin90_trtRIS          po_trtRIS po_in_range_trtRIS 
#>             4.2414             4.2988             4.2382             0.7400 
#> po_distance_trtRIS 
#>            -0.0020

Well done,although code would be much simpler without tidyverse. I suspect it’s a noise / unstable estimates issue.

1 Like

Thanks, appreciate you taking a look. I’d love to let tidyverse take the blame for my code but I think there might also be a user component there :slight_smile: