RMS Semiparametric Ordinal Longitudinal Model

Johannes_Schwenke · October 14, 2025, 12:04pm

I’ve written two functions which might be of use to others. They are surely not fully optimized but quick enough for my use cases.

Predictions from blrm() fit which can include the invidual random effects. I needed this for a project where we’re really interested in the ATE, because an intervention applies to a whole population, not for 1:1 decision making. And from my current understanding based on this, we need to include the RE in the standardization for that.

predict_blrm()

predict_blrm_manual <- function(fit, newdata, include_re = FALSE) {
  # --- 0) Prerequisite Checks ---
  if (!is.data.frame(newdata)) {
    stop("`newdata` must be a data frame.")
  }
  if (include_re && !"id" %in% colnames(newdata)) {
    stop("`newdata` must contain a 'id' column when include_re=TRUE")
  }
  
  N_new <- nrow(newdata)

  # --- 1) Get the design matrix for the new data ---
  X <- rms::predictrms(fit, newdata = newdata, type = 'x')

  # --- 2) Extract posterior draws and model information ---
  draws <- fit$draws
  ndraws <- nrow(draws)
  ns <- fit$non.slopes
  K <- ns + 1
  
  intercept_draws <- draws[, 1:ns, drop = FALSE]
  beta_draws <- draws[, (ns + 1):ncol(draws), drop = FALSE]

  # --- 3) Calculate the fixed-effect linear predictor (eta_fixed) ---
  eta_fixed <- beta_draws %*% t(X)

  # --- 4) Get the random effect draws (u_draws) if requested ---
  u_draws <- matrix(0, nrow = ndraws, ncol = N_new)
  
  if (include_re) {
    if (length(unique(newdata$id)) > 1) {
      stop("`newdata` contains multiple unique patient IDs. This version is restricted to one at a time.")
    }
    if (!requireNamespace("posterior", quietly = TRUE)) {
      stop("The 'posterior' package is required to extract random effect draws.")
    }
    sf <- rmsb::stanGet(fit)
    all_gamma_draws <- posterior::as_draws_matrix(sf$draws(variables = "gamma"))
    patient_ids <- newdata$id
    u_draws <- all_gamma_draws[, patient_ids, drop = FALSE]
  }

  # --- 5) Calculate the total linear predictor (eta) ---
  stopifnot(all(dim(eta_fixed) == dim(u_draws)))
  eta <- eta_fixed + u_draws

  # --- 6) Calculate probabilities for each observation and each draw ---
  p_cat_draws_list <- vector("list", N_new) 
  
  for (i in 1:N_new) {
    eta_i <- eta[, i]
    s_ge <- plogis(sweep(intercept_draws, 1, eta_i, "+"))
    
    p_cat_draws <- matrix(NA_real_, nrow = ndraws, ncol = K)
    p_cat_draws[, 1] <- 1 - s_ge[, 1]
    if (K > 2) {
      for (kk in 2:(K - 1)) {
        p_cat_draws[, kk] <- s_ge[, kk - 1] - s_ge[, kk]
      }
    }
    p_cat_draws[, K] <- s_ge[, K - 1]
    colnames(p_cat_draws) <- paste0("y=", 1:K)
    
    p_cat_draws_list[[i]] <- p_cat_draws
  }

  # --- 7) Reshape draws into a single 3D array ---
  draws_array <- array(
    data = NA_real_,
    dim = c(ndraws, N_new, K)
  )

  for (i in 1:N_new) {
    draws_array[, i, ] <- p_cat_draws_list[[i]]
  }

  dimnames(draws_array) <- list(
    NULL,
    rownames(newdata),
    colnames(p_cat_draws_list[[1]])
  )

  return(draws_array)
}

Function which calculates uncondtional (conditional on baseline-states only) SOPs for a second order markov model. I’m relatively confident that I’m correctly unconditionalizing the SOPs but a review is warranted.

Second Order Markov SOPs

#' Calculate Second-Order State Occupation Probabilities
#'
#' This function calculates the unconditional state occupation probabilities for a
#' second-order Markov model.
#'
#' @param model_fit The fitted model object (rmsb or brms)
#' @param baseline_data A single-row data frame containing the initial states and
#'   all baseline predictor values required by the model.
#' @param col_yprev A string with the column name for the state at t-1 (e.g., "yprev").
#' @param col_ypprev A string with the column name for the state at t-2 (e.g., "ypprev").
#' @param col_time A string with the column name that represents time in the model (e.g., "week").
#' @param times A numeric vector specifying the time points for calculation (e.g., `1:26`).
#' @param n_states An integer for the number of levels in the ordinal outcome.
#' @param absorbing_states A numeric vector listing the absorbing states.
#'
#' @return A list containing two arrays:
#'   \item{unconditional}{A 3D array `(draws, states, time)` with the unconditional state probabilities.}
#' @export

sop_markov_second_order <- function(model_fit,
                                    baseline_data,
                                    col_yprev,
                                    col_ypprev,
                                    col_time,
                                    times,
                                    n_states,
                                    absorbing_states,
                                    include_re = FALSE) {

  # --- Detect model type and get number of draws ---
  if (inherits(model_fit, "brmsfit")) {
    detected_type <- "brms"
    if (!requireNamespace("marginaleffects", quietly = TRUE) || !requireNamespace("posterior", quietly = TRUE)) {
      stop("Packages 'marginaleffects' and 'posterior' are required for brms models.")
    }
    n_draws <- posterior::ndraws(model_fit)
  } else if (inherits(model_fit, "blrm")) {
    detected_type <- "rmsb"
    n_draws <- nrow(model_fit$draws)
  } else {
    stop("Unsupported model object class. Please provide a model from 'brms' (class 'brmsfit') or 'rmsb' (class 'blrm').")
  }
  
  # --- Parameters ---
  M <- n_states
  T_horizon <- max(times)

  # --- Initial Conditions ---
  initial_state_t_minus_1 <- as.numeric(baseline_data[[col_ypprev]])
  initial_state_t_0 <- as.numeric(baseline_data[[col_yprev]])

  # --- Data Structures to Store Results ---
  P_joint_bayes <- array(0, dim = c(n_draws, M, M, T_horizon + 1),
                         dimnames = list(paste0("draw_", 1:n_draws),
                                         paste0("t-1=", 1:M),
                                         paste0("t=", 1:M),
                                         paste0("time=", 0:T_horizon)))

  P_uncond_bayes <- array(0, dim = c(n_draws, M, T_horizon + 1),
                          dimnames = list(paste0("draw_", 1:n_draws),
                                          paste0("State_", 1:M),
                                          paste0("t=", 0:T_horizon)))

  # Set the known probability at t=0
  P_joint_bayes[, initial_state_t_minus_1, initial_state_t_0, 1] <- 1.0
  P_uncond_bayes[, initial_state_t_0, 1] <- 1.0

  # --- Create the static grid for batch predictions ---
  history_grid <- expand.grid(
    h = factor(1:M, levels = 1:M), # X_{t-2}
    j = factor(1:M, levels = 1:M)  # X_{t-1}
  )
  names(history_grid) <- c(col_ypprev, col_yprev)

  # --- Dynamically create the full prediction dataset ---
  # Identify static covariates by excluding time-varying/history columns
  cols_to_exclude <- c(col_yprev, col_ypprev, col_time)
  static_covariate_names <- setdiff(names(baseline_data), cols_to_exclude)
  static_covariates <- baseline_data[, static_covariate_names, drop = FALSE]

  # Combine static covariates with the grid of all possible histories
  prediction_data <- cbind(static_covariates, history_grid)

  # --- Main Iteration Loop ---
  for (t in times) {
    # Identify predictable histories
    predictable_idx <- which(
      !(prediction_data[[col_ypprev]] %in% absorbing_states) &
      !(prediction_data[[col_yprev]] %in% absorbing_states)
    )
    n_predictable <- length(predictable_idx)

    # Subset the data for prediction and set the current time
    prediction_data_subset <- prediction_data[predictable_idx, ]
    prediction_data_subset[[col_time]] <- t


    pred_array_3d <- NULL

    if (detected_type == "brms") {
      # refactor previous states to remove absorbing states
      prediction_data_subset[[col_yprev]] <- factor(prediction_data_subset[[col_yprev]], levels = c(1:M)[-absorbing_states])
      prediction_data_subset[[col_ypprev]] <- factor(prediction_data_subset[[col_ypprev]], levels = c(1:M)[-absorbing_states])  
      re_arg <- if (include_re) NULL else NA
      brms_pred_long <- marginaleffects::predictions(model_fit, 
                                                     newdata = prediction_data_subset, 
                                                     type = "response", 
                                                     re_formula = re_arg) |>
                        marginaleffects::posterior_draws()
      
      pred_array_3d <- array(0, dim = c(n_draws, n_predictable, M))
      indices <- cbind(brms_pred_long$drawid, 
                       brms_pred_long$rowid, 
                       as.numeric(brms_pred_long$group))
      pred_array_3d[indices] <- brms_pred_long$draw
      
    } else { # detected_type == "rmsb"
    # dimension 1 = 4000 draws
    # dimension 2 = X_{t-1} and X_{t-2} combinations (rows from prediction_data_subset)
    # dimension 3 = cumulative probabilities (e.g., 4 columns for 5 states)
      pred_array_3d <- predict_blrm_manual(model_fit, prediction_data_subset, include_re = include_re)
    }

    state_probs_all_draws <- do.call(rbind, lapply(1:dim(pred_array_3d)[1], function(d) pred_array_3d[d, , ]))

    # --- Assemble transition probabilities for each draw ---
    # Create a 4D array to hold all transition probabilities: P(X_t=l | X_{t-2}=h, X_{t-1}=j)
    # Dims: (draw, h, j, l)
    # So [, 1, 1, 1] give all draws for P(l = 1 | h = 1, j = 1)
    # [, 1, 1, ] for all draws for P(l | h = 1, j = 1) (rows must sum to 1)
    cond_probs_4d_array <- array(0, dim = c(n_draws, M, M, M))

    for (d in 1:n_draws) {
      # Create a placeholder matrix for this specific draw
      # each row is combination of (h, j), each column is l (rows must sum to 1)
      # this is for one draw only
      cond_probs_matrix_d <- matrix(0, nrow = M * M, ncol = M)
      
      # Handle absorbing states (P(j->j) = 1 if j is absorbing)
      absorbing_idx <- which(prediction_data[[col_yprev]] %in% absorbing_states)
      if (length(absorbing_idx) > 0) {
        absorbing_state <- as.numeric(as.character(prediction_data[[col_yprev]][absorbing_idx]))
        rows_and_cols <- cbind(absorbing_idx, absorbing_state)
        cond_probs_matrix_d[rows_and_cols] <- 1.0
      }
      
      # Get state probabilities for the current draw
      start_row <- (d - 1) * n_predictable + 1
      end_row <- d * n_predictable
      draw_d_probs <- state_probs_all_draws[start_row:end_row, ]
      
      # Place them into the correct rows
      cond_probs_matrix_d[predictable_idx, ] <- draw_d_probs
      
      # Reshape and store in the final 4D array
      cond_probs_4d_array[d, , , ] <- array(cond_probs_matrix_d, dim = c(M, M, M))
    }

    # --- Update Probabilities ---
    # Get joint probabilities from the previous step for all draws: P(X_{t-2}, X_{t-1})
    prev_P_joint_all_draws <- P_joint_bayes[, , , t]
    current_P_joint_all_draws <- array(0, dim = c(n_draws, M, M))

    for (d in 1:n_draws) {
      # This is P(X_{t-2}=h, X_{t-1}=j) for draw d
      prev_P_joint_d <- prev_P_joint_all_draws[d, , ]
      # P(X_t = l | X_{t-1} = j, X_{t-2} = h) for this draw
      cond_probs_d <- cond_probs_4d_array[d, , , ]
      
      # Law of Total Probability: P(X_t=l|h,j) * P(h,j)
      weighted_probs <- cond_probs_d * array(prev_P_joint_d, dim = dim(cond_probs_d))
      # Marginalize over the t-2 state (h) to get P(X_{t-1}, X_t)
      # = Summation over h (the state at t-2), resulting in P(X_{t-1}=j, X_t=l)
      current_P_joint_d <- apply(weighted_probs, c(2, 3), sum)
      
      current_P_joint_all_draws[d, , ] <- current_P_joint_d
      
      # Marginalize over the t-1 state (j) to get P(X_t)
      uncond_probs_t <- colSums(current_P_joint_d)
      # Consider normalizing  to prevent floating point error accumulation (not doing this for now)
      P_uncond_bayes[d, , t + 1] <- uncond_probs_t
      # P_uncond_bayes[d, , t + 1] <- uncond_probs_t / sum(uncond_probs_t)
    }
    
    P_joint_bayes[, , , t + 1] <- current_P_joint_all_draws
  }

  return(P_uncond_bayes)
}

M is the set of possible states, l is state at time t, j is state t-1, h is state t-2.

We start with a known history at t=0, so the joint probability of the observed j_{obs} and h_{obs} and states for X_{t-1} and X_{t-2} is 1.0
We use this known probability as the “weight” to calculate the joint probabilities for t=1, t=0.
- P(X_{t=1} = l, X_{t-1} = j\_{obs}) = P(X_{t=1} | X_{t-1}=j_{obs}, X_{t-2}=h_{obs}) \times 1
- So for t=1 we have M non-zero joint probabilities, one for each possible state M at time t=1.
We then use the joint probabilities from t = 1, t = 0 as the weights to calculate the joint probabilities for t=2, t = 1.
- P(X_{t=2} = l, X_{t-1} = j) = \sum_{h=1}^{M} P(X_{t=2} | X_{t-1}=j, X_{t-2}=h) \times P(X_{t-1}=j, X_{t-2}=h)
- Now for t=2 we have M \times M joint probabilities, one for each combination of possible states at times t=1 and t=2.
…and so on.
Final step for any time t: Unconditional probability P(X_t=l), we sum the joint probabilities from the previous step over all possible previous states j.
- P(X_t=l)=\sum_{j=1}^M P(X_t=l,X_{t−1}=j)

plus handling of absorbing states I haven’t written in plain language yet.

f2harrell · October 15, 2025, 1:43pm

Nice work! If I understand you correctly you are developing a forward-time formula. How about a backwards time one like I did for first order Markov? And comment on a limiting issue: most studies don’t have pre-time-zero measurements (previous baseline values) needed for 2nd order processes.

Johannes_Schwenke · October 15, 2025, 1:58pm

I wasn’t aware that you are doing backward time, even thought about doing it that way. FWIW, I did forward time first-order model first, and got the exact same results as your sopmarkordm() function. I will have to inspect your function.

True, but for our studies, I can pull the data we need, e.g., hospitalization, ward, etc. directly from the EHR of our health-system. And patients who participate have consented to general use of their data for research, i.e., also data before baseline.

Bottom line: i think as more studies are embedded in routine, second-order (or higher) markov modeling is very doable, if the routinely collected data is high quality.

f2harrell · October 16, 2025, 12:10pm

Good points. It will be very useful to quantify (using proportion of log likelihood explained etc.) how much information is contained in the 2nd order model that is not contained in the predictive information of the 1st order one.

f2harrell · October 16, 2025, 12:31pm

Side comment: a compound symmetric correlation structure such as that induced by including random intercepts in the model can be approximated by conditioning on the average of all previous responses. Unconditioning on this will be very difficult. Also, using your method with large M will be computationally very expensive. We should consider a parallel path where we make the simulation of longitudinal processes from fitted models more efficient than the current Hmisc function. By simulating an N=100,000 sample we can get whatever unconditional estimates are needed very easily. It would be nice to also find a way to do posterior sampling in the same spirit, to capture all the uncertainties of parameter estimates.

Johannes_Schwenke · October 16, 2025, 12:53pm

Sorry, I’m very new in this regard. I planned to do some comparisons mostly using loocv, but would ofc appreciate guidance on additional must-do analyses.

Huh, I didn’t know that. Just the arithmetic mean of the previous ordinal states taken as integers? Or what kind of average?

I actually checked my manual calculations of the SOPs against a simulated dataset, where I resampled from the baseline-covariates. But I just took the median of the parameter estimates, couldn’t quickly find a nice way to include all of the uncertainty. Will try to think about it more. But I would wager than 100k sample might actually on the low end.

f2harrell · October 16, 2025, 12:54pm

Here is some useful information from Claude AI:

This is an elegant setup—the proportional odds (PO) model applied to ordinal outcomes with state dependencies should indeed provide substantial computational and statistical benefits. Let me work through the implications:

Leveraging the Proportional Odds Structure

Dramatic dimension reduction: Instead of estimating K² × K transition probabilities directly, you’re estimating:

One set of cutpoints (thresholds): J-1 parameters for J ordinal categories
Coefficients for the previous two states as predictors
A time trend coefficient
Possibly state × time interactions if needed

This collapses your parameter space from O(K³) to O(K + J + time_components), a massive reduction when K is large.

Natural ordering exploitation: The PO assumption enforces that transitions respect the ordinal structure—movement between adjacent categories is more likely than distant jumps. This provides strong regularization without explicit constraints, improving mixing and reducing multimodality.

Cutpoint structure stabilizes inference: Unlike unrestricted transition matrices where you can have near-zero or near-one probabilities that create difficult geometry, the cutpoints create relatively stable log-odds surfaces that HMC navigates efficiently.

Efficient Stan Implementation

data {
  int<lower=1> N;           // number of observations
  int<lower=1> K;           // number of states (large)
  int<lower=2> J;           // number of ordinal categories
  int<lower=1> T;           // number of time periods
  
  array[N] int<lower=1,upper=J> y;        // ordinal outcomes
  array[N] int<lower=1,upper=K> state_t_minus_1;
  array[N] int<lower=1,upper=K> state_t_minus_2;
  array[N] int<lower=1,upper=T> time;
}

parameters {
  ordered[J-1] cutpoints;           // J-1 thresholds
  vector[K] state_effect_1;         // effect of state at t-1
  vector[K] state_effect_2;         // effect of state at t-2
  real time_trend;
  
  // Optional: hierarchical structure for state effects
  real<lower=0> sigma_state;
  real mu_state;
}

model {
  // Priors
  cutpoints ~ normal(0, 1.5);
  time_trend ~ normal(0, 1);
  
  // Hierarchical priors on state effects
  mu_state ~ normal(0, 1);
  sigma_state ~ exponential(1);
  state_effect_1 ~ normal(mu_state, sigma_state);
  state_effect_2 ~ normal(mu_state, sigma_state);
  
  // Likelihood
  for (n in 1:N) {
    real eta = state_effect_1[state_t_minus_1[n]] + 
               state_effect_2[state_t_minus_2[n]] + 
               time_trend * time[n];
    y[n] ~ ordered_logistic(eta, cutpoints);
  }
}

Key Advantages for Large K

State effects as regression coefficients: By treating state_t-1 and state_t-2 as predictors rather than constructing a massive transition matrix, you avoid the O(K³) memory footprint. The state effects are K-dimensional vectors—manageable even for large K.

Hierarchical regularization: Placing a normal distribution on state effects (with unknown mean and variance) is computationally cheap but provides strong shrinkage. States with limited data borrow strength from the population distribution. This is critical when K is large and some state combinations are rare.

Geometric stability: The ordered_logistic likelihood creates well-behaved geometry. The cumulative probability structure means the gradient landscape is typically smoother than with unrestricted multinomial models.

Handling Non-Stationarity

Time trend coefficient: Including a global time coefficient is efficient. If you suspect state-specific time effects, consider:

vector[K] state_time_interact;  // state-specific trends (optional)

Then modify eta to: eta = state_effect_1[...] + state_effect_2[...] + (time_trend + state_time_interact[...]) * time[n];

This remains computationally tractable because you’re still avoiding the full transition matrix.

Segmented trends or splines: If the time trend is non-linear, use basis functions (B-splines or low-rank approximations) rather than polynomial terms. This keeps dimensionality controlled.

Sampling Efficiency Improvements

Centered vs. non-centered parameterization: The hierarchical setup above is naturally centered. If you get divergences, experiment with non-centered:

vector[K] state_effect_1_raw;
// In model block:
state_effect_1_raw ~ std_normal();
// Construct: state_effect_1 = mu_state + sigma_state * state_effect_1_raw;

Adapt delta for ordinal likelihoods: Ordered logistic can have sharp curvature near decision boundaries. Start with adapt_delta = 0.95 and increase if needed, though it’s usually better-behaved than alternatives.

Reduced rank approximation for state effects (if still struggling): If K is very large (thousands), consider modeling state effects with a lower-dimensional latent structure:

matrix[K, R] state_loadings;      // K states, R << K latent factors
vector[R] latent_factors;
// state_effect_1 ≈ state_loadings * latent_factors

Posterior Sampling for State Occupancy Probabilities

Once you have posterior draws of parameters, computing posterior state occupancy probabilities is straightforward:

For each posterior draw θ, you can:

Use forward filtering to compute state probabilities at each time step
The proportional odds model gives you: P(y_t | state_t-1, state_t-2, θ) via ordered_logistic
Forward-backward algorithm recovers latent state sequence posterior if needed
Marginalize to get marginal occupancy: P(state_t = k | data, θ)

Crucially, this happens in post-processing—you’re not sampling latent states in Stan itself unless you need their full posterior. This avoids the extreme dimensionality of sampling K values at T time points.

Practical Recommendations

Start simple: Fit without hierarchical structure on state effects first to verify the basic model works. Adding hierarchical structure helps when K is very large or data per state is sparse.
Diagnostic focus: Monitor divergences and Rhat, especially for the cutpoints. State effects often mix well due to regularization, but cutpoints can be correlated with the global intercept (if present).
Posterior predictive checks: Generate predicted ordinal outcomes grouped by (state_t-1, state_t-2) combinations to verify the model captures transition dynamics.
Validation: Compare occupancy probabilities across states—with strong regularization, you want to ensure observed differences in state effects are meaningful, not artifacts of shrinkage.

Does this align with your setup? Are you sampling latent state sequences, or do you observe the states directly from the ordinal outcomes?

f2harrell · October 16, 2025, 12:57pm

When Y is not interval-scaled, and the categories of Y are not large in number, this amounts to conditioning on k-1 covariates for k categories, with each covariate being the proportion of previous observations that were in the indicated state.

Johannes_Schwenke · October 16, 2025, 1:33pm

Would you mind sharing your prompt? I think it might help me understand some of the output. Thanks!

f2harrell · October 17, 2025, 1:54pm

Here is the first prompt which led to some output I didn’t show:

How can Bayesian posterior sampling using Stan software be used efficiently to get posterior draws for state occupancy probabilities from second order Markov processes with a large number of states and an absorbing state?

Then I prompted with:

For my situation K is large and the probability model considers the outcome variable as ordinal using a proportional odds model with the addition as another covariate the previous two states. A general time trend is included so that it is not helpful to pre-compute a probability matrix due to the non-stationarity. The ordinal model puts many useful constraints on the transition probabilities which should help in getting a general computational solution.

Here is the whole Claude Haiku session: https://claude.ai/share/35fb8f5a-3654-4321-85c4-4b80e3bb51c1

matthewtam · November 14, 2025, 7:35pm

Hello Frank and Johannes,

We have incorporated the use of Ocens() into the formula for our blrm() model and excluded yprev.

After making sure that the command would run normally, we added yprev back in as a predictor to set the premise of a Markov model.

However, the yprev column is structured based on the aforementioned y column in our previous query, where between endoscopies (at 0, 12, 24, and 36 months), the values from the last known endoscopy are carried forward.

We are wondering if it is acceptable to use yprev in this fashion, or if we should account for the uncertainty between endoscopies also in yprev, and not only on the outcome, and how would this look like. Ocens() does not seem to work on the right side of the equation (only works in the outcome?).

Matthew and Juan

matthewtam · November 21, 2025, 10:40pm

Hello Frank and Johannes,

In retrospect, it seems you already answered our question in your previous reply. We will try to incorporate the modal categories into our analysis. Thank you for your help!

Matthew and Juan

wfukuokaya · December 8, 2025, 3:32am

Dear Professor Harrell @f2harrell,

I am using rmsb::blrm to fit a Bayesian transition model for longitudinal pain data (0–10 scale) collected in RCT. Because pain is recorded only while patients receive the intervention, I have modeled treatment discontinuation as an absorbing state (coded as 11).

I observed frequent direct transitions from low pain (often 0) to discontinuation (11). This suggests that discontinuation often occurs for reasons unrelated to gradual pain progression.

Would it be appropriate to specify a partial POs model where the treatment effect for the discontinuation cutpoint (Y ≥ 11) is estimated separately, while maintaining the PO assumption across the pain scale cutpoints (0 to 10)?

Thank you for your time and for developing the rmsb package.

Best regards,

Johannes_Schwenke · December 8, 2025, 12:38pm

From a clinical perspective, it seems somewhat strange to me to treat discontinuation as an absorbing state for pain. I think it’s easiest to interpret absorbing states as states which do not intersect with other states or are worse. E.g., if a patient is dead, they are not in pain. But if a patient discontinues, they might still be in pain, we just don’t know. I’m also not sure if discontinuation can be considered worse than the worst pain (?).

I think it’s more a censoring issue than an absorbing state issue. If you think censoring is non-informative, I’d think you can just include all observations up to discontinuation and stick to a 10 level scale.

I’m not sure whether this also means that censoring is non-informative…

f2harrell · December 8, 2025, 1:48pm

I’m torn between these two ways of thinking but lean towards thinking of discontinuation as a bad event and an absorbing one. I don’t know how to properly penalize a treatment otherwise.

If discontinuation is considered as one of the worst outcomes, then an ordinal comparison of treatments will reveal which treatment gives the patients the best typical outcome even if a partial proportional odds model is not used. If you allow for non-PO with regard to discontinuation (not a bad idea as a secondary analysis) you can’t get an overall treatment assessment from the parameters alone. But you can compare treatments over time even with non-PO by using mean time in good states.

wfukuokaya · December 13, 2025, 10:52am

Thank you, Schwenke-sensei.
I combined the pain scale with treatment discontinuation to capture the on-treatment experience, as excluding off-treatment data may introduce bias. In this RCT, the experimental arm improved PFS, and the protocol mandates treatment until progression or unacceptable toxicity.
Consequently, the control arm exhibits earlier dropout. Treating this as censoring would ignore the differential follow-up between arms.

Clinically, I consider manageable pain preferable to disease progression (discontinuation); therefore, I ranked discontinuation as a worse outcome than the highest pain score.

wfukuokaya · December 13, 2025, 11:04am

Dear Professor Harrell,

Thank you very much for your helpful comments. I will proceed with analyses using an ordered outcome that includes both the pain scale and treatment discontinuation.
I have one follow-up question. Am I correct in thinking that derived quantities such as the difference in mean time in state or days of benefit (as described in your tutorial) can still be estimated with less bias from the Bayesian transition model, even when the treatment effect departs from POs?

f2harrell · December 13, 2025, 2:43pm

Failing to include non-PO terms when non-PO is substantial can still result in finding the best overall treatment. But probabilities estimated from such a model will be inaccurate, leading to inaccurate mean times in states. Besides routinely including non-PO for absolute time (allowing some types of outcomes to occur early and some tending to occur late) we sometimes need to allow the treatment effect to operate in non-PO over outcome levels, i.e., have partial PO terms with respect to both time and treatment.

wfukuokaya · December 14, 2025, 4:25am

Thank you for your guidance. I will include partial POs terms for both time and treatment in my analysis.

Uriah · December 25, 2025, 4:18pm

I’ve published a small Python package for Aalen–Johansen (AJ) estimates.

For now, it supports conventional multi-state models (time-to-event and competing risks), similar in scope to lifelines.

The package started as an internal development tool for performance evaluation in {rtichoke}, but I realized it could be useful beyond that context as well.

If there’s interest in more complex multi-state setups or extensions, I’d be happy to explore and expand it—feel free to reach out.