Hello everyone,
I’m currently analyzing an integer response (ranging from 0 to 68) using an ordinal logistic regression model (special thanks to @f2harrell for the amazing rmsb
package!) and would like to provide a measure of its discriminative accuracy.
In biomedical research, AUC is ubiquitous (for better or worse). The default AUC output from blrmStats
, as I understand it, measures the ability of the model to rank different values of Y
.
In clinical practice, the way Y
is most often (~exclusively) used is to decide whether or not to pursue further (more invasive/expensive) testing based on a threshold. Different places use different thresholds which usually range from 4 to 8. Differentiating high-scoring individuals (e.g. 15s from 20s) does not alter the decision to pursue further testing. As such, the region where the model’s discriminative ability is of most interest is for response scores near the lower-end of the spectrum.
In this scenario, would it make sense to display several AUCs corresponding to a range of decision thresholds rather than (or in addition to) the “overall” AUC?
If so, would the way to do this be to:
-
Create a dichotomized version of the response variable in the original dataset for a given decision threshold of interest (e.g.,
x$new_response <- ifelse(x$response >= threshold, 1, 0)
). -
Calculate c-statistic between this dichotomized variable and the model’s predicted logodds (e.g.,
corr.cens(x$new_response, model_predictions)
). -
Repeat over different draws from the posterior distribution (to get uncertainty intervals) and using different thresholds of the response variable (for decision-makers using different cutoffs)
Also, in this case, how would you interpret the difference between the overall AUC and the decision threshold-specific AUCs (if the former is lower by a modest amount)?
Would you say the model is slightly better at differentiating individuals who meet a given decision threshold than it is at ranking individuals generally? If so, would it be appropriate to prioritize the former (decision-threshold AUC) than the latter (overall AUC) given how the model would be used clinically?
I will of course be providing other outputs (e.g. Pr(y > threshold|predictors)
and decision-analysis curves across different thresholds) but I wanted to ask about the AUCs specifically because I couldn’t find much literature on this.