AUC for different decision thresholds in an ordinal model

Ahmed_Sayed · September 15, 2024, 4:38pm

Hello everyone,

I’m currently analyzing an integer response (ranging from 0 to 68) using an ordinal logistic regression model (special thanks to @f2harrell for the amazing rmsb package!) and would like to provide a measure of its discriminative accuracy.

In biomedical research, AUC is ubiquitous (for better or worse). The default AUC output from blrmStats, as I understand it, measures the ability of the model to rank different values of Y.

In clinical practice, the way Y is most often (~exclusively) used is to decide whether or not to pursue further (more invasive/expensive) testing based on a threshold. Different places use different thresholds which usually range from 4 to 8. Differentiating high-scoring individuals (e.g. 15s from 20s) does not alter the decision to pursue further testing. As such, the region where the model’s discriminative ability is of most interest is for response scores near the lower-end of the spectrum.

In this scenario, would it make sense to display several AUCs corresponding to a range of decision thresholds rather than (or in addition to) the “overall” AUC?

If so, would the way to do this be to:

Create a dichotomized version of the response variable in the original dataset for a given decision threshold of interest (e.g., x$new_response <- ifelse(x$response >= threshold, 1, 0)).
Calculate c-statistic between this dichotomized variable and the model’s predicted logodds (e.g., corr.cens(x$new_response, model_predictions)).
Repeat over different draws from the posterior distribution (to get uncertainty intervals) and using different thresholds of the response variable (for decision-makers using different cutoffs)

Also, in this case, how would you interpret the difference between the overall AUC and the decision threshold-specific AUCs (if the former is lower by a modest amount)?

Would you say the model is slightly better at differentiating individuals who meet a given decision threshold than it is at ranking individuals generally? If so, would it be appropriate to prioritize the former (decision-threshold AUC) than the latter (overall AUC) given how the model would be used clinically?

I will of course be providing other outputs (e.g. Pr(y > threshold|predictors) and decision-analysis curves across different thresholds) but I wanted to ask about the AUCs specifically because I couldn’t find much literature on this.

f2harrell · September 16, 2024, 7:52pm

You can’t compare c from a binary Y to c from a multi-level ordinal Y, as the latter is a more difficult discrimination task, reflected by systematically lower c. And don’t coarsen Y by making a new binary variable. You can use the ordinal model to predict the probability of being at or above any level y of Y and compute c against a temporarily binary Y using that cutoff. This may take some programming.

Ahmed_Sayed · September 17, 2024, 2:16am

Thank you Prof. Harrell, that makes sense.

For the computation of c, the approach I’ve been following is:

Make 500 draws from the posterior for the probability of each patient being at threshold value y or higher
Calculate c for each 1 of these 500 draws to get a threshold-specific posterior distribution of c

Like in the following reproducible example:

library(rmsb)
library(dplyr)

#Create example dataset
data <- data.frame(
  x = rnorm(200, mean = 5, sd = 1),
  y = sample(0:68, 200, replace = TRUE)
)

options(mc.cores = 4)
#Use blrm to model y as a function of x
model <- blrm(data = data,
              y ~ x)

#Create n of patients
n_patients <- nrow(data)
#Create n of desired draws
n_draws <- 500

#Loop over rows of dataset to generate 500 predictions of subject i having a score of 6 or more
lapply(1:n_patients,
       function(i)
         predict(model, 
                 data[i,], 
                 type="lp", 
                 posterior.summary = "all", 
                 #Choosing a threshold of 6 or greater
                 kint = 6)[
                   #Sampling 500 draws from the posterior without replacement
                   sample(1:2000, size = n_draws, replace = FALSE)]
) -> preds_6
#We now have a list with containing 500 predictions for each of our patients


#Create a temporary threshold variable
data$threshold_indicator <- ifelse(data$y >= 6, 1, 0)

##Calculate a c-index for each i of these 500 predictions to generate a posterior distribution of c-indices
#Loop over each of the 500 predictions
lapply(1:n_draws,
       function(i)
         rcorr.cens(x = lapply(1:n_patients,
                               function(patient) preds_6[[patient]][i]), 
                    S = data$threshold_indicator)["C Index"]
) %>% unlist -> auc_dist