RMS Semiparametric Ordinal Longitudinal Model

Regression Modeling Strategies: Semiparametric Ordinal Longitudinal Model

This is the 22nd of several connected topics organized around chapters in Regression Modeling Strategies. This topic is for a chapter that is not in the book but is in the course notes. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.

Overview | Course Notes

A common cause of disappointment (e.g., uninformative nulls) is pursuing low-information (insensitive) outcomes. Thoughtful effort given to understanding and choosing high-resolution high-information Y will likely improve PTS.

A high-resolution high-information Y can flexibly accommodate the timing and severity of a variety of outcomes (terminal events, non-terminal events, and recurrent events); and the more levels of Y the better (fharrell.com/post/ordinal-info). The longitudinal ordinal model is a general and flexible way to capture severity and timing of outcomes.

The proportional odds longitudinal ordinal logistic model with covariate adjustment is recommended (the Markov model better still). With this ordinal model there is no assumption about Y distribution, and random effects (intercepts) handle intra-patient correlation.

The proportional odds ordinal logistic model can estimate the probability that Y=y or worse as a function of time and treatment. This modeling approach provides estimates of efficacy for individual patients by addressing the fundamental clinical question: ‘If I compared two patients who have the same baseline variables but were given different treatments, by how much better should I expect the outcome to be with treatment B instead of treatment A?

With this ordinal longitudinal model one can obtain a variety of estimates: such as time until a condition, and, expected time in state. The ordinal model does assume proportional odds but the partial proportional odds model relaxes this.

The model provides a correct basis for analysis of heterogeneity of treatment effect.

Bayesian partial proportional odds model, moreover, can compute more complex probabilities of special interest, such as the probability that the treatment affects mortality differently than it affects nonfatal outcomes.

Additional links


Q&A From May 2021 Course

  1. Where do you think is the best place to start learning about Bayes, coming from a frequentist perspective? I know McElreath’s course is highly recommended, but I don’t think it ‘tells you’ the parallel approaches from the frequentist world - is there an (intro level) course or resource that does? dgl-I am working my way through the new Gelman book (Regression and Other Stories, Gelman, et al, (2021).), and am impressed with it. I recommend it. It has a balanced approach. fh-Kruschke does a lot of side-by-side Bayesian/frequentist analyses.
  2. Is there any limit to the number of states that the Bayesian Markov model can handle? Do you know if the models fitted using the Bayesian approach are comparable with the non-Bayesian Markov models fitted in the msm() package? Great questions. msm does not handle ordinal states, so every state is its own category and needs its own large sample size. With ordinal states there is no limit to the number of states as long as the proportional odds assumption is reasonably satisfied. You just need an good overall sample size.

Hi all,
A student and I would like to build an ordinal first-order Markov prediction model. Our data set has several hundred participants with outcomes measured at 6 time points across 4 months. The challenge we have encountered is that there are substantial missing values in the outcome variable (~15-20% at later time points). Consequently, there is a considerable proportion of missingness in the lagged outcome variable, which is a predictor in the model. The transcran help function has a great 6-step approach to imputing baseline variables using the time-varying outcomes. But we’re having trouble figuring out a way to multiply impute values for the time-varying lagged outcome variable. The Amelia II package allows for longitudinal imputation using a long data format. We can create a list of complete long-format data sets using the Amelia package, but we seem to lose the ability to capitalize on many of the desirable features of the rms package (fit.multiple.impute). Does anyone have suggestions for imputing missing values in time-varying predictors within the rms package?

This is a great topic. There are 6 approaches I can think of:

  • multiple imputation using tall and thin data, with standard multiple imputation algorithms
  • multiple imputation using wide data, e.g. Stata has a procedure for optimum within-person imputation looking forwards and backwards
  • full Bayesian modeling with missings treated as parameters (pretty complex here)
  • keep the data gaps and when measurements restart, sacrifice the first measurement so that it can be used as a lag for the 2nd measurement after the restart (not very efficient)
  • assume that last state carried forward is valid, i.e., that if a patient were measured instead of missing the measured value agrees pretty well with the last measured value
  • developed a complex recursive likelihood function for probability of the current state that is conditional only on the previous measured states

This is an active research area and we need to do a lot of work to select the best approach.

1 Like

Thank you, Frank! This is very helpful. I will follow the literature on this topic as this is an approach we plan to use often. At this point, we imputed using the mice package in the wide format, and rms seems to work seamlessly with mids objects. This Stack Overflow post was a helpful starting point. Our diagnostics for the MI model look good, but we will compare a few different approaches before proceeding with the primary analyses.

1 Like