Restricted mean survival time and comparing treatments under non-proportional hazards

This is my first post. I am a cardiologist who has led many clinical trials and possess some statistical knowledge but I am not a statistician - so please respond to a level I might comprehend! Several related questions: What is the best way to test time to first event data when proportional hazards aren’t present? How are the overall summary effects best described? If a Cox model is not applicable, how about logistic regression with follow-up time included as a log-transformed offset variable? And what are the pros and cons of restricted mean survival time analysis? Is this a useful metric? I understand that the values it generates are typically small. But perhaps this is accurate. That is, our effective therapies are producing a relatively minimal benefit for patients (and same for harms). I do understand the limitation that there are some patients who will benefit greatly, others minimally or not at all, and others who are harmed. Is there a way to describe this distribution?


I am thrilled that you have joined the site Gregg and look forward to interactions on many clinical trial design and interpretation fronts.

For your questions I think there are many experts out there who will add a lot of the discussion. Here are my initial thoughts:

  • We tend to think too dichotomously in assessing proportional hazards (PH), and there are a lot of close calls. It would be better IMHO to use a Bayesian approach that somewhat favors PH but allows for non-PH. This can be done, for example, by putting a skeptical prior distribution on the interaction between follow-up time and the treatment effect. As the number of events grows, the prior will wear off and the model will automatically make the treatment effects more time-dependent if the data dictates that.
  • When fitting a non-PH Cox model such as the one just described, there is still a question about how to tell which treatment is better if the hazard functions cross. The cumulative hazard function is possibly one way to go.
  • Restricted mean survival time (RMST) has some drawbacks:
    • There is always the question of whether the mean is indicative of the typical patient outcome for a treatment, as opposed to median survival time
    • Many statisticians who compute RMST do so from Kaplan-Meier estimates; this may not be statistically as efficient as using a flexible parametric survival curve
    • It is not quite correct to say that if treatment A yields RMST=2 years and treatment B yields RMST=2.5 years that treatment B patients live 0.5 years longer. This all depends on what happens after time u, where RMST is the area under the survival curve S(t) from 0 to u years.
    • RMST from 0 to u equals overall mean survival time (which is the area under S(t) for t=0 to infinity) only if S(t) drops to zero at t=0. So RMST can be thought of as the life expectancy given that a patient dies as the end of the [0,u] observation period. I’m not fond of estimates that condition on the future.
    • The difference in RMST between treatments can be misleading if the trajectory of one of the two survival curves changes significantly at t=u. In other words you may be about to see a change in delta RMST if you could only observe survival for t > u.
    • I don’t think that RMST is readily clinically interpretable.

With the Cox model one can stratify on treatment so as to make no PH assumption for the treatment effect, and to still adjust for all the covariates. This yields “adjusted KM curves” and provides the “whole unadorned answer” to all reviewers. It still begs the question “which treatment is better?”. The only way to really answer that in general is to elicit patient utilities for risks at all time points, which will also provide information about how much future time discounting patients utilize.

The great cardiac surgeon John Kirklin, whom I had the great fortune to work with, when about this a different way on a one-patient-at-a-time basis. He would ask the patient “what do you want to be alive to see” when considering valvular heart surgery. If the patient said “I want to see my daughter graduate from college in 3 montths” he would advise against surgery for the present, knowing the high perioperative risk with such open heart surgery. It the patient said “I want to see my son graduate from college in 3 years” he would advise for surgery now.

Concerning logistic regression, I’d like to see a reference to understand how that is actually carried out. At first blush it seems to have no advantage to a treatment-stratified Cox model and I’m not sure it will handle all the censoring distributions that the Cox PH model can easily handle.


im very interested to learn how RMS became a favoured method for survival data. Dave Collett’s book on survival analysis doesnt mention it, or Frank Harrell’s book + the use of the acronym RMS for “regression modelling strategies” when it could be confused with “restricted mean survival” suggests the sudden popularity of the latter? the original paper on RMS is from the 1940s but new papers are appearing asking how to choose tau: On the empirical choice of the time window for restricted mean survival time It seems RMS achieved widespread approval rapidly in contrast with accelerated failure time modelling which never really gained traction, despite it being promoted as analternative to PH regression (eg respected statistician Richard Kay had a paper in pharm stat journal some years ago) … I will post further on this in a day or two regarding sas’s new proc rmstreg: Performing Restricted Mean Survival Time Analysis (RMST) Using SAS/STAT

1 Like

I have been thinking a lot about restricted mean survival time (RMST) in the past few months. I was against the idea at first and still don’t think it is as relevant in the time to first event setting as the cumulative incidence at the end of follow-up. For covariate-adjusted cumulative incidence I would favor a Cox model with a time \times treatment interaction (non-proportional hazards (PH) for treatment effect) in most situations, and for Bayesian modeling I’d put a skeptical prior on the amount of non-PH, that could be overridden by enough events.

RMST can provide a useful one-number summary in the presence of non-PH, but it covers up some of the problem. But the biggest problem with RMST is that the majority of medical papers using it are using Kaplan-Meier estimates instead of covariate-adjusted models. For efficiency and better interpretation we need to fully recognize outcome heterogeneity within treatment by having covariates in the model. We could assume PH for covariates but relax the PH assumption for treatment.

The most difficult question I know of related to this discussion is which time horizon should be emphasized. Do we care where patients end up (use final cumulative incidence) or do we care how they got there (use RMST)?


i was impressed by the availability of good recent papers, providing the statistician with a quick summary re RMS - should they need to model RMS for a client and need to get up to speed quickly:

  • “On the empirical choice of the time window for restricted mean survival time” Biometrics, 2020. (you suspected this is a preprint, I’m able to download and read it, once I’m passed the paywall)

  • “Restricted mean survival time as a summary measure of time-to-event outcome” Pharmaceutical Statistics, 2020

  • Analyzing Restricted Mean Survival Time Using SAS/STAT” 2019 (an intro to proc rmst for sas users, but still gives a decent, brief intro to RMS) - thus no excuse for not adjusting for covariates

  • “Empirical power comparison of statistical tests in contemporary phase III randomized controlled trials with time-to-event outcomes in oncology”, Clinical Trials, 2020: “For overall survival, … restricted mean survival time with fixed t offered the highest [power]”

the pharm stat paper notes the sudden, recent use of RMS, eg: “At the annual meeting of the American Society of Clinical Oncology, some topics related to RMST were reported in 2018 and 2019 … [A]t the fourth and sixth Data Science Round Table Meetings held in Japan on 9 March 2017 and 11 March 2019 statisticians from the health authority (PMDA), academia, and pharmaceutical companies convened to discuss hot topics regarding biostatistics, and RMST was one of the discussion topics”

this is where i’m seeing it, ie in industry, especially when censoring is high, ie much above 50%, or the sample size is small (rare diseases). I agree: i would not like it for a time-to-first composite, it would be difficult to interpret and likely sensitive to some component of the composite. But im happy to have an alternative the HR in cases where there is high censoring. I’m surprised how many papers refer to the enhanced interpretability of the RMS over HR, sometimes without much explanation, ie this seems to be the “standard” statement now. I’m not sure i understand/appreciate how it is conditioning on the future, ie it is just the expected survival in the specified time window? [edit: ie you say “only if S(t) drops to zero at t=0”]

1 Like

I hope that others will more fully discuss the excellent issues you raised. At present I feel that much of the appeal of RMST (note: RMS is not what is typically used for the acronym) is that it’s on the time scale which patients and doctors understand. But I do believe that many users of RMST only pretend to understand what it means. What has gotten me more interested in it is that I think that discrete time state transition models are often the way to go, and a standard summary from such models is mean time in state. When state=alive and there are only two states, this is RMST.

1 Like

okay, that might give me a clue as to why SAS re estimation are referencing andersen’s 2003 biometrika paper “with Applications to Multi-State Models”, but i havent read passed the abstract and look forward to reading the detail … Generalised Linear Models for Correlated Pseudo-Observations, with Applications to Multi-State Models on JSTOR