Relationship between Probabilistic Index and Common Odds Ratio

dnzmarcio · April 1, 2024, 4:49am

Hello,

I have been using Probabilistic Index models introduced by Thas et al (2012) as an extension of the Mann-Whitney that accommodates covariates in several analyses. More recently, I have learned about the Cumulative Probability models introduced Liu et al (2016) that is also an extension of the Mann-Whiney test that accommodates covariates. They became interesting to me because CPM can also accommodate random effects and I also learned that there is a relationship between Probabilistic Index (PI) and the common Odds Ratio (cOR) presented in De Neve et al (2019) as follow:

PI = cOR \times \frac{cOR - log(cOR) - 1}{(cOR - 1)^2}

I tried to validate this relationship in R:

library(ordinal)
library(pim)
find_pi <- function(cOR){
  
  aux <- function(cOR){
    if (cOR != 1)
      out <- cOR*(cOR - log(cOR) - 1)/((cOR - 1)^2)
    if (cOR == 1)
      out <- 0.5
    
    return(out)
  }
  
  aux <- Vectorize(aux)
  out <- aux(cOR)
  
  return(out)
}

set.seed(1234)
z_clm <- rep(NA, 100)
z_pim <- rep(NA, 100)
pi_clm <- rep(NA, 100)
pi_pim <- rep(NA, 100)

for (i in 1:100){
  
  y0 <- rnorm(100, 4, 1)
  y1 <- rnorm(100, 5, 1)
  
  y <- c(y0, y1)
  x <- c(rep(0, length(y0)), rep(1, length(y1)))
  
  dt <- data.frame(y = as.factor(y), x = as.factor(x))
  fit <- clm(y ~ x, data = dt)
  sm_clm <- summary(fit)
  cor_clm <- exp(sm_clm$beta)
  pi_clm[i] <- find_pi(cor_clm)
  z_clm[i] <- sm_clm$coefficients[nrow(sm_clm$coefficients), 3]
  
  dt <- data.frame(y, x = as.factor(x))
  fit <- pim(y ~ x, data = dt)
  sm_pim <- summary(fit)
  
  pi_pim[i] <- plogis(sm_pim@coef)
  z_pim[i] <- sm_pim@zval
}

plot(pi_clm, pi_pim) 
abline (coef = c (0, 1))

plot(z_clm, z_pim) 
abline (coef = c (0, 1))

There are small differences when estimating the probabilistic index. I assumed that the observed differences are numerical. Does that make sense?

However, the differences in Z-score make me a bit more concerned when p-values are borderline.

Then, I tried to re-run a few previous analyses and I found some discrepancies in the conclusions. For example,

library(ordinal)
library(pim)
data('FEVData')
fit_pim <- pim(FEV~ Smoke + Sex + Age + Height, data=FEVData)
sm_pim <- summary(fit_pim)
pi_pim <- plogis(sm_pim@coef)


fit_clm <- clm(as.factor(FEV)~ Smoke + Sex  + Age + Height, data=FEVData)
sm_clm <- summary(fit_clm)
cor_clm <- exp(sm_clm$beta)
pi_clm <- find_pi(cor_clm)

pi_pim
# Smoke       Sex       Age    Height 
# 0.3928110 0.5361332 0.5512311 0.5915537 
sm_pim
# Estimate Std. Error z value Pr(>|z|)    
# Smoke  -0.43551    0.23093  -1.886   0.0593 .  
# Sex     0.14479    0.11098   1.305   0.1920    
# Age     0.20565    0.02940   6.994 2.68e-12 ***
# Height  0.37039    0.02312  16.018  < 2e-16 ***

pi_clm
# Smoke       Sex       Age    Height 
# 0.4193120 0.5513324 0.5432701 0.5895691 
sm_clm
# Estimate Std. Error z value Pr(>|z|)    
# Smoke  -0.48797    0.26103  -1.869   0.0616 .  
# Sex     0.30897    0.14530   2.126   0.0335 *  
# Age     0.26021    0.04000   6.505 7.75e-11 ***
# Height  0.54269    0.02735  19.843  < 2e-16 ***

As you can see, there are differences on the conclusion for the variable Sex even though the estimates of PI are very close. What am I missing?

References:

Thas O, Neve JD, Clement L, Ottoy JP. Probabilistic index models. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2012 Sep;74(4):623-71.

Liu Q, Shepherd BE, Li C, Harrell Jr FE. Modeling continuous response variables using ordinal regression. Statistics in medicine. 2017 Nov 30;36(27):4316-35.

De Neve J, Thas O, Gerds TA. Semiparametric linear transformation models: Effect measures, estimators, and applications. Statistics in medicine. 2019 Apr 15;38(8):1484-501.

f2harrell · April 1, 2024, 1:36pm

See this where I show that the formula from Nevé reduces to a simpler function of the odds ratio, and found a slight modification of that that yields the probability index (concordance probability) even under non-proportional odds.