This is the 13th of several connected topics organized around chapters in Regression Modeling Strategies. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.

I have several questions regarding the concordance statistic, in the setting of a multivariable proportional odds logistic model and specifically how it is calculated by lrm(), calibrate() from the rms package. This question may be generalised to: “Which of the discrimination indexes returned by lrm() apply to the whole model, and which are specific to each outcome level?”

Regarding the c-statistic [(Sommers’ D + 1)/2], I imagine it is some generalisation of the proportion of all pairs of patients in which Yj >= Yi when betaXj >= betaXi, but how is the intercept included? Does the C-statistic vary for different levels of the outcome or is there one global C-statistic given for the entire proportional odds model, and if so, is there even any interpretation or value for a outcome level specific C-statistic?

Good question. D_{xy} and c are whole-Y measures of pure predictive discrimination. They are based on all possible pairs of observations with different Y values. Since in the PO model the intercepts apply equally to all X, the full linear predictor and linear predictor ignoring the intercepts have a 1-1 relationship to each other so you can ignore the intercepts.

I assume this would also apply to the optimism-corrected calibration slope too, but not the optimism corrected intercept, Emax nor Brier’s score - is this correct?

The majority of my audience (clinicians) have difficulty understanding the output of the proportional odds model. Is there any reason why I shouldn’t convert the parameter point estimates and intervals from OR into the actual outcome (in my case KOOS scores) and only present those data?

No reason not to, you just have several choices. Ordinal regression starts with estimation of exceedance probabilities. These lead to estimating the whole distribution of Y | X. For a truly continuous Y you can summarize the whole distribution with the predicted median if you must. For continuous or discrete Y that is approximately interval-scaled you can predict the mean. These are obtained by the Quantile and Mean functions applied to an ordinal regression fit object in the rms package.

to identify patients who would most benefit from a medication review by a pharmacist before discharge we reviewed 939 discharge letters ( of different patients) for drug related problems. Since not all errors are equally important we used an ordinal scale with 5 levels , level 5 being worst. I have 2 continues independent variables ( modeled with RCS, k=4) and 4 categorical independent variables.

is it reasonable to combine levels of Y of in certain levels there were too few observations? 399 patients had no errors, level 1 = 11 patients , level 2= 255, level 3=166, level 4= 103, level 5= 5

I used RMS, and I know how to get the predicted probabilities for a new patient , but I want to extract the formula, because I want to use it in a clinical software ,to get predictions and decide which patiens discharge letter to review. ( for example those with probability above 0.5 to be in the highest two Y levels). is there a code to do that? I can do it by myself but the RCS part is a little bit complicated

You said ‘dependent variable’ in two spots where I assume you meant ‘independent variable’.

If you are willing to make the proportional odds assumption then it is not a good idea to combine categories unless you have evidence that the categories being combined are equivalent.

You can easily get the predicted probabilities you want. For the case of \Pr(Y \geq 4) you need to use just the second-to-last intercept. You can get an algebraic form of the fitted model using the latex function (which works when using R markdown or when directly making a \LaTeX report. To get the form of just the linear predictor part of the fitted model you can use Function(fit object) on any system. The generated R code will use just one intercept (I forget whether it’s the first or the middle one) and you can insert whichever intercept you want.