Multivariate ordinal logistic regression of correlated outcomes

Ahmed_Sayed · December 17, 2024, 8:49pm

I am working on a problem where the main task is to jointly predict 2 correlated outcomes using 5 predictors. The 2 outcomes are still correlated after conditioning on the 5 predictors.

The desired output would provide the end-user with exceedance probabilities (probability of either outcome exceeding some threshold of interest). Because of the nature of the data, a semi-parametric model that does not make strong distributional assumptions would likely be ideal, hence my interest in using ordinal logistic regression.

I realize that naively doing 2 separate multivariable ordinal regressions would ignore correlations between the 2 outcomes and therefore result in suboptimal predictions since the exceedance probabilities are not independent of one another. I’m wondering if anyone has had any luck fitting multivariate ordinal models before that can handle the residual correlation between outcomes. The tools I’ve found so far are primarily designed for ordinal variables where the N of ordinal categories is not very large so they’re difficult to adapt to my current problem (where both outcomes are continuous in nature).

Elias_Eythorsson · December 18, 2024, 9:15am

These are my initial thoughts, take them with a grain of salt.

First fit a Bayesian model (model 1) for outcome 1 using the 5 predictors.

Next fit a Bayesian model for outcome 2 (model 2), that includes outcome 1 as a covariate in addition to the 5 predictors.

When presenting the model for use to the end-user, first have model 1 predict the posterior predictive distribution of outcome 1 and then include the posterior predictive distribution as a covariate in model 2 to obtain the posterior predictive distribution of outcome 2.

Then present the end-user with the exceedance probabilities from the two posterior distributions.

Does that make any sense?

f2harrell · December 18, 2024, 12:26pm

That may make a lot of sense. We need a simulation experience to test goodness-of-fit of that approach — we need to figure out the induced correlation structure and the induced marginal outcome model for the 2nd outcome and see if they are realistic. Also there is a lot of progress in the Stan world in copulas, and you might also consider adding random effects to each of the two marginal models, where the random effects have an unknown amount of correlation with each other. This may induces reasonable models.

It will also be interesting to see if it’s easy to marginalize the second model. If it turns out to be hard analytically, it is easy to simulate an unlimited number of conditional outcomes, the do the marginalization by simple arithmetic on simulated values. This can be burdensome computationally, but is a general purpose solution. An example of this: Fit a first-order Markov ordinal model for one outcome and also conditions on the average of all previous values (similar to conditioning on number of previous hospitalizations if hosp. is the recurrent event outcome) in addition to conditioning on the previous state. This induces a random-intercepts + AR1 type of correlation structure. It’s hard to marginalize over previous state and average of previous states to get state occupancy probabilities. But it’s easy to simulate data from the fitted model with that structure, then computing simple marginal proportions of being in state y at time t.