Whatever you do, learn maximally from https://cran.r-project.org/web/packages/survival/vignettes/compete.pdf
Iām wondering what the maximum of y-levels is someone here has used a longitudinal ordinal model for and how that went. My experience so far is limited to less than 10 levels. Iām wondering if we run into issues if we have very large K, e.g., 100+, because we have to estimate a separate parameter for every level of the previous state.
The ORBITA group, with analyses by Matthew Shun-Shin, have used a 74-level ordinal longitudinal outcome in a Bayesian RCT. There is no limit (nor any statistical or computation cost) to having unlimited number of distinct Y-values unless you allow the treatment effect to be different on every Y level, i.e., you allow for the most general departure from proportional odds for treatment, i.e., you treat Y as if itās not ordered.
Thanks, I shall have a look! I think I read their analysis plan a while back and they had daily measurements, IIRC.
In the extreme case of having no repeated measurements and only an ordinal outcome with large K at baseline and at the end of follow-up. Would you still adjust for baseline (baseline being dummy coded)? Wouldnāt this use up too many df, even if we make the PO assumption? (I should probably just simulate this).
If the baseline variable has so many levels it is likely to be at least partially ordered (counterexample would be a geographic area code). We donāt want to assume linearity for such baseline variables. Take for example a variable X having levels 0-20 where X=20 means something distinct from the process that generated X=0-19. Then I would adjust for X using a regression spline or ordinary quadratic function and also add an indicator variable for X=20.
If others are looking for this trial / the analysis. They reported (had to?) frequentist results in NEJM, but their Bayesian analysis plan can be found here. They fit a restricted cubic spline with 4 knots on numeric yprev.
Should others be interested in other examples of the transition model, I recently finished the SAP for a small oncology trial. Itās surely not perfect, some people will definitely dislike the use of average treatment effects, but I hope that itās still better than most ways oncology trials are analyzed.
Edit: I forgot to add that I extensively built on code from Frankās and Max Rohdeās tutorial paper, for which I am very grateful!
WOW this is a really excellent and comprehensive SAP that will likely benefit many others. Thanks for sharing it!
On a quick read I didnāt see any diagnostics for goodness of fit of the correlation pattern assumed in the longitudinal random effects model.
Did you have anything in mind besides correlation plots and variograms? These are included. Iāve really only included the random effects for regularization, because Iām still quite worried about too narrow uncertainty estimates when the assumption of conditional independence given the previous state(s) is violated.
Nothing else, I just wanted to be sure that they were used for non-Markov longitudinal models.
Iāve been toiling away on an R Package for easier modeling of longitudinal ordinal outcomes with VGAM. If one wants to target the average treatment effect (ATE), interval estimation can be very time-intensive with nonparametric bootstrap refitting: for a trial with about 1,000 patients, it can take around 30 minutes without parallelization, even with fairly optimized/vectorized code.
I therefore explored alternatives, including drawing coefficient vectors from a (cluster-robust) multivariate normal (MVN). That is much faster (often by at least an order of magnitude).
However, after reading Ye et al. 2023 and discussion on Bluesky, my
understanding is that MVN simulation mainly propagates coefficient uncertainty while conditioning on the observed covariate distribution X (i.e., closer to a sample average treatment effect (?)). In preliminary simulations under a superpopulation setup, I see slight CI undercoverage with this approach, which could worry some stakeholders.
If I understand correctly, nonparametric bootstrap should capture variability in X, as we resample X, but it just takes too long when simulating many scenarios, even on 80+ cores.
So Iāve been experimenting with a one-step score-based approach related to Score Based Approach to Wild Bootstrap Inference, and Iād appreciate feedback on whether this is theoretically reasonable.
My approach is as follows: We approximate uncertainty by simulation.
-
Draw cluster (patient) weights w_g \sim \mathrm{Exp}(1) (exponential multiplier weights, analogous to Bayesian bootstrap weighting).
-
Compute centered multipliers u_g = w_g - 1, and form a perturbed score at \hat\beta: \sum_g u_g S_g(\hat\beta), where S_g is the patient-aggregated score contribution.
-
Use one Newton step with the modelās variance-covariance information to get approximate perturbed coefficients instead of fully refitting.
-
Repeat this many times to generate \beta^{*}, then run the state-occupancy calculation pipeline.
-
For marginalization, reuse the same drawās cluster weights (mapped to baseline patients and normalized), so variability in X is propagated along with coefficient uncertainty.
I know that the Bayesian Bootstrap doesnāt actually have the goal of frequentist coverage⦠But from some initial simulations things do look better than when simulating from an MVN. However, I usually assume that there is some theoretical arguments against most things one could be doing. So, is there a clear theoretical reason this approach should be avoided for frequentist inference on superpopulation-targeted marginal effects? Thanks!
I hope someone can comment on the specifics. All of this is a reason to use Bayes, and I question the utility of marginal estimates. They are not worth the trouble since their interpretation is a function of the entire study covariate distribution.
While I also do prefer to use Bayes, I do not always have to luxury to do so. I also have my doubts about the practicality of the ATE, but currently thatās what everybody wants⦠We can of course use Bayes to target the ATE as well, but that also takes ~30 minutes, so requires quite a lot of compute for larger simulations.
ATE will result in an estimate that not only may not apply to anyone in the sample but may not apply to anyone in an entire population. Example: suppose sex is the only important covariate and that it is very important. Marginalizing over sex will result in an estimate that is between that for females and that for males, and hides the important sex effect. Depersonalized medicine.
I wanted to report a finding that slightly surprised me. I had the chance to reanalyze trial data from COV-BARRIER, which recorded the 8-level NIAID ordinal score for 28 days. What surprised me was that all the differences between the groups comes from mortality, not something we saw in ACTT-2 for example. Thereās no difference in ventilation, high-flow oxygen etc.
This results in the unexpected situation that P(Benefit) is larger if one just runs a logistic regression for death at 28 days, compared to the longitudinal ordinal model over the whole 28 days using all 8 levels.
Itās probably quite unusual / unlucky, but made me more aware that including more levels can reduce power if we include levels which are not affected by treatment.
Thatās a somewhat unusual but extremely interesting example. I hope the dataset can be used by others to explore other things. The first thing to explore is to try to find utilities for all the states and to see which treatment has the highest expected utility by 28d.
I wouldnāt say this is a power reduction per se, but rather an averaging out of the effect over many outcome levels.
A Bayesian design would cover the bases, e.g., success criteria could be P(OR for mortality < 1) > 0.95 or P(overall OR < 0.85) > 0.95.
Yes, averaging out is probably the better term. Do you know of any study that elicited utilities to the NIAID scale (through discussions with patients / doctors) ? Weāre currently not investigating this.
FWIW: P(Benefit) in ACTT-2 was also higher when analysing the time to recovery compared to the Markov model, which I also did not expect. Working more extensive simulations and resampling studies currently, which to you disappointment, will be frequentistā¦
ACTT-2 was remdesivir, right? The mistake made in its analysis was to not condition on baseline state. My re-analysis showed that change in time to recovery was dominated by severely affected patients at baseline. Expected time in state, computed separately by baseline severity, reveals all.
I think you reanalyzed ACTT-1 in Rhode et al, ACTT-2 is Baricitinib (JAK-Inhibitor). But patient populations does seem quite similar, I donāt have access to ACTT-1 unfortunately. I shall have a look at time to recovery in ACTT-2 and how it depends on the baseline state.