BBR Session 11: Nonparametric Tests Using the Proportional Odds Model

f2harrell · January 23, 2020, 3:50am

This is a place for questions, answers, and discussion about session 11 of the Biostatistics for Biomedical Research airing 2020-01-24 introducing the proportional odds model as a generalization of the Wilcoxon and Kruskal-Wallis tests. Session topics are listed here. The video will be in the YouTube BBRcourse Channel and will be directly available here after the broadcast. Power calculations are covered, and Bayesian ordinal logistic regression is briefly introduced.

ckogan · February 19, 2020, 4:36pm

Is there a good way to use orm with no covariates?

Trying to use orm for this to obtain CI on the mean/median as shown in the course video, but without any covariates:

library(rms)
y <- round(runif(100,0,10))
dd <- datadist(y)
options(datadist = 'dd')
f <- orm(y~1)

Get the following error: Error in fitter(X, Y, offset = offs, penalty.matrix = penalty.matrix, :
object ‘kof’ not found

f2harrell · February 19, 2020, 4:47pm

Sorry it doesn’t handle that. It should have. Not sure if lrm does either. The fit object does have the -2 log likelihood for the intercept-only model.

pmbrown · March 22, 2020, 10:43pm

i’m not very familiar with R. If i want to see the actual code behind the R package orm, how can i see it? I can only find the documentation: https://www.rdocumentation.org/packages/rms/versions/5.1-4/topics/orm. Maybe i can find it here somewhere?: https://cran.r-project.org/web/packages/rms/ cheers

scboone · March 22, 2020, 11:08pm

Little bit off-topic maybe, but for a lot of R functions (especially from downloaded packages) you can simply view the code by typing the command without parentheses:

orm

Results in the following output in the console (I only show the first 10 lines)

function (formula, data, subset, na.action = na.delete, method = "orm.fit", 
    model = FALSE, x = FALSE, y = FALSE, linear.predictors = TRUE, 
    se.fit = FALSE, penalty = 0, penalty.matrix, tol = 1e-07, 
    eps = 0.005, var.penalty = c("simple", "sandwich"), scale = FALSE, 
    ...) 
{
    call <- match.call()
    var.penalty <- match.arg(var.penalty)
    if (!missing(penalty) || !missing(penalty.matrix)) 
        stop("penalty not yet implemented")

The following post on Stack Overflow goes into quite more detail for other situations:

f2harrell · March 23, 2020, 11:16am

Best to go to the master source on github: https://github.com/harrelfe/rms/tree/master/R

pmbrown · March 30, 2020, 4:45am

is there any guidance regarding adequate sample size given the no. of intercepts to be estimated? In the Harrell et al. Stat Med paper they consider small (eg n=50) samples in their simulations but estimate “a few intercept values”. And although in the analysis of CD4 counts they estimate 831 intercepts, they have n=4776. I will soon find out because i will estimate CPMs with number intercepts = around 0.5 x n (with n<100). cheers
[i initially posted this Q here: Quality of life surveys and proportional odds models, but maybe more suitable in this thread]

f2harrell · March 30, 2020, 12:50pm

Just as in estimating the empirical cumulative distribution function or running the Wilcoxon test, the number of ties in the data actually work against you. The fewer ties, the better everything works. Fewer ties \rightarrow more intercepts. There is no penalty for intercepts because they are order-constrainted unlike all the \beta s.

tomasbencomo · March 31, 2020, 10:03pm

I’m a little confused about the Kruskal Wallis test being a specific case of the Proportional Odds (PO) model. Section 7.6 of BBR says that the PO model is a generalization of the Kruskal Wallis test. When I mess around with some toy data, the p-values between the Kruskal Wallis test and PO model Likelihood Ratio test seem rather different. Are there different assumptions between the two methods that would explain this difference in p-values?

For example, with the following toy dataset

A <- c(0.05, 0.1, 0)
B <- c(1.5, 1.56, 6.9)
C <- c(5.9, 1.3, 1.4)
samples <- data.frame(
   grp = c(rep("A", 3), rep("B", 3), rep("C", 3)), 
   value = c(A, B, C)
)
kruskal.test(value ~ grp, data=samples)

	Kruskal-Wallis rank sum test

data:  value by grp
Kruskal-Wallis chi-squared = 5.9556, df = 2, p-value = 0.05091

orm(value ~ grp, data=samples)
Model Likelihood            Discrimination            Rank Discrim.    
                          Ratio Test                   Indexes                  Indexes       
 Obs              9    LR chi2     13.07    R2                      0.775    rho     0.843    
 Distinct Y       9    d.f.            2    g                      10.170                     
 Median Y       1.4    Pr(> chi2) 0.0015    gr                  26102.571                     
 max |deriv| 0.0003    Score chi2  10.37    |Pr(Y>=median)-0.5|     0.384                     
                       Pr(> chi2) 0.0056                                                      
 
         Coef     S.E.    Wald Z Pr(>|Z|)
[deleted intercepts]
 grp=B    20.3396 77.5439  0.26  0.7931  
 grp=C    18.3268 77.5333  0.24  0.8131

There’s a sizable difference between Kruskal Wallis vs PO model (p=.05091 vs p=0.0015). I wasn’t expecting for the p-values to be exactly the same due to different estimation methods, but still surprised by the difference. What would be the cause of this difference?

f2harrell · March 31, 2020, 10:29pm

You have perfect separation in the data, which points out how inaccurate the \chi^2 distribution is for K-W in some cases. K-W should be hugely significant with your dataset. See if you can get an exact p-value somehow (permutation-test-like). The score test from PO is the one that’s equivalent. I’d stake my money on the results from the PO model fit. You might also try a two-sample subset because wilcox.test will give exact p-values there.

Correction: I didn’t look at your data closely enough. You don’t have perfect separation.

tomasbencomo · March 31, 2020, 10:51pm

Thanks for the help. On a related question, a colleague uses Graphpad Prism, which recommends Kruskal Wallis (exact p-value computation) and Dunn’s test for subsequent pairwise comparisons.

The BBR notes don’t mention Dunn’s test and instead suggest using the PO model and its coefficients/p-values. Do you have any thoughts on using Dunn’s test vs the PO model coefficients?

My understanding is that Dunn’s includes a multiple comparisons correction while the PO model does not, and I’m not sure if this multiple comparisons correction is needed as I usually don’t see multiple comparisons corrections for regression coefficients. Interestingly Dunn’s test seems to yield decently smaller p-values than the PO model.

dunnTest(value ~ grp, data=samples)
Dunn (1964) Kruskal-Wallis multiple comparison
  p-values adjusted with the Holm method.

  Comparison         Z    P.unadj      P.adj
1      A - B -2.385139 0.01707266 0.05121798
2      A - C -1.639783 0.10105026 0.20210051
3      B - C  0.745356 0.45605654 0.45605654

pmbrown · April 1, 2020, 12:27am

thanks. I’ll just note: using nlmixed in sas (because i have random effects for patients - repeated measures) it took about a day to run and the results ‘make sense’. The nlmixed requires array statements for the intercepts, cumulative probs etc and specification of the likelihood (depending on the link function - i’ve done logit, probit and cloglog). I’ll share the code once it’s cleaned up in case some people are not using R. I ought to use the orm R package to validate the sas estimates too …

pmbrown · April 1, 2020, 1:27am

confused why on BBR p7-25 the probabilities are given as “>=” and the intercepts are decreasing. For the CDF i expect the reverse. Can I assume the data were ranked from largest to smallest? (although I can’t see this in the R code). cheers

f2harrell · April 1, 2020, 1:32am

The only way I know to do Dunn’s test here is to run it from Wilcoxon tests. The pairwise Wilcoxon tests are not consistent with the K-W test. So stick with tests on the parameters in the single unified PO model. Better: Compute simultaneous confidence intervals for the multiple pairwise PO contrasts (log odds ratios).

f2harrell · April 1, 2020, 1:33am

Don’t use that. Use this: CRAN: Package mixor

f2harrell · April 1, 2020, 1:35am

It doesn’t matter whether you rank the data from small to large or large to small. I’m just stating the PO model using a parameterization that makes the binary logistic model be a special case. Most other people use P(Y \leq y) whereas I state it in terms of P(Y \geq y).

Mike_Babyak · December 7, 2020, 9:44pm

Hi All, Inspired by the course and this discussion, I’m about to embark on a new analysis for a colleague who is interested in associations between plasma epinephrine levels and body fat variables (all continuous variables)— in which I’d like to apply the ordinal logistic approach.

It looks like the Liu et al (2017) paper in Statistics in Medicine would be most relevant in helping convince my colleagues (and ultimately reviewers) that this is a sound approach.

Are there other papers that might be useful, perhaps even your own work?

f2harrell · December 8, 2020, 2:55am

The chapter on ordinal modeling for continuous Y in RMS is also worth a look, along with this and the BBR notes.

jlwoodard · June 5, 2025, 1:55pm

I apologize if I have some formatting problems with my question-it is my first post on the forum.

I had a few questions in the 7.6.1 K-W example. I included Frank’s code below. First, would I just start with testing H0 with the LRT p value, then proceed to the contrasts if significant? I also wasn’t sure how to compare A and B directly to get the contrast, confidence interval, z and p value.

In the output for the PO ordinal regression model, what do the beta’s reflect? Are they directly interpretable?

I understand that the contrast shown for the comparison between B and C is the difference between the betas, but how would we interpret the odds ratio for that comparison? Would it be something like “The odds that a score from group C is larger than a score from group B are 3.46?”

set.seed(1)
group <- rep(c('A','B','C','D'), 100)
y <- rnorm(400, 100, 15) + 10*(group == 'B') + 20*(group=='C') + 30*(group=='D')
require(rms)
options(prType='html')
dd <- datadist(group, y); options(datadist='dd')
f <- orm(y ~ group)
f    # use LR chi-square test as replacement for Kruskal-Wallis

# Compare B and C
k <- contrast(f, list(group='C'), list(group='B'))
k

# Show odds ratios instead of differences in betas
print(k, fun=exp)

jlwoodard · June 5, 2025, 2:25pm

The example in BBR 7.6.2 was really interesting for me, but I did again want to check if I am on the right path. If I selected a particular intercept, such as y>=86 (beta=0.1122) and I converted the logit beta to a probability, could I say then that the probability is 52.8% that a randomly selected score in the no/mild reference group has a score greater than or equal to 86 and a 94.6% probability that a randomly selected score from the moderate/severe group would have a score greater than or equal to 86 (I just added 2.7586 to the 0.1122 beta and converted that logit to a probability)?

Frank’s code is below, which generated the output.

require(rms)
options(prType='html')
dd <- datadist(calpro, endo); options(datadist='dd')
f <- orm(calpro ~ endo)
print(f, intercepts=TRUE)

summary(f, endo='No or Mild Activity')