Prediction of ordinal outcomes and estimating the value of a new biomarker

Hi folks,

Given the (I think appropriate) emphasis on ordinal COVID-19 outcomes, I’m trying to figure out how to work through the usual regression modelling strategies for prediction model building and validation with ordinal outcomes. In this case, I’m looking at a 5-point ordinal outcome ranging from no COVID infection all the way up to COVID-associated death.

  1. How to estimate the discrimination performance of a model predicting an ordinal outcome.
    …I read f2harrell’s Ch.13 in Regression Modelling Strategies and it looks like can use Somers D and C-statistic as usual.

  2. How to estimate the value of adding a variable (e.g. a biomarker) to a existing model?
    …I like likelihood ratio tests to establish statistical significance and I think I can use that here, but wonder how best to quantify the added value using net benefit ala Andrew Vickers. Perhaps will have to do by calculating probability of outcome at each level and above. Maybe will just have to use (the less attractive to me but of interest to many readers) change in AUC.

  3. Sample size considerations.
    …I have been using lately the excellent pmsampsize package from Richard Riley’s group, but this only informs binary, time to event, and continuous outcomes. So far I have been assuming that sample size requirements are LOWER when fitting models for ordinal outcomes than for binary outcomes. Is this correct? I will try to use penalized maximum likelihood estimation to lower the effective degrees of freedom, but the (forward continuous ratio) model will likely need extensions to work around the proportional odds assumption.

Any guidance would be most welcome.


Hi Pavel, a few points:

  • Is your ordinal outcome measured at only one point in time?
  • It is not recommended to do model selection, i.e. to do statistical assessments that result in variables being dropped from the. Pre-specify the full model.
  • The gold standard is the log likelihood and measures derived from it. See also this.

Hi Frank and thank you,

  1. Regarding the ordinal outcome, although it can be detected at any time during followup and the date will be known, I was planning to treat it as a single assessment essentially occurring at any time during followup.

  2. No plans for model selection - all variables will be in model. RCS for continuous variables.

  3. regarding sample size, I’m still not sure if the requirements are lower when you have an ordinal outcome to avoid overfitting. Essentially you have (in the proportional odds case) a single vector of coefficients but separate intercepts for each level of the ordinal variable. This is more parameters than if the outcome were binary.