Restricted mean survival time and comparing treatments under non-proportional hazards

This is my first post. I am a cardiologist who has led many clinical trials and possess some statistical knowledge but I am not a statistician - so please respond to a level I might comprehend! Several related questions: What is the best way to test time to first event data when proportional hazards aren’t present? How are the overall summary effects best described? If a Cox model is not applicable, how about logistic regression with follow-up time included as a log-transformed offset variable? And what are the pros and cons of restricted mean survival time analysis? Is this a useful metric? I understand that the values it generates are typically small. But perhaps this is accurate. That is, our effective therapies are producing a relatively minimal benefit for patients (and same for harms). I do understand the limitation that there are some patients who will benefit greatly, others minimally or not at all, and others who are harmed. Is there a way to describe this distribution?


I am thrilled that you have joined the site Gregg and look forward to interactions on many clinical trial design and interpretation fronts.

For your questions I think there are many experts out there who will add a lot of the discussion. Here are my initial thoughts:

  • We tend to think too dichotomously in assessing proportional hazards (PH), and there are a lot of close calls. It would be better IMHO to use a Bayesian approach that somewhat favors PH but allows for non-PH. This can be done, for example, by putting a skeptical prior distribution on the interaction between follow-up time and the treatment effect. As the number of events grows, the prior will wear off and the model will automatically make the treatment effects more time-dependent if the data dictates that.
  • When fitting a non-PH Cox model such as the one just described, there is still a question about how to tell which treatment is better if the hazard functions cross. The cumulative hazard function is possibly one way to go.
  • Restricted mean survival time (RMST) has some drawbacks:
    • There is always the question of whether the mean is indicative of the typical patient outcome for a treatment, as opposed to median survival time
    • Many statisticians who compute RMST do so from Kaplan-Meier estimates; this may not be statistically as efficient as using a flexible parametric survival curve
    • It is not quite correct to say that if treatment A yields RMST=2 years and treatment B yields RMST=2.5 years that treatment B patients live 0.5 years longer. This all depends on what happens after time u, where RMST is the area under the survival curve S(t) from 0 to u years.
    • RMST from 0 to u equals overall mean survival time (which is the area under S(t) for t=0 to infinity) only if S(t) drops to zero at t=0. So RMST can be thought of as the life expectancy given that a patient dies as the end of the [0,u] observation period. I’m not fond of estimates that condition on the future.
    • The difference in RMST between treatments can be misleading if the trajectory of one of the two survival curves changes significantly at t=u. In other words you may be about to see a change in delta RMST if you could only observe survival for t > u.
    • I don’t think that RMST is readily clinically interpretable.

With the Cox model one can stratify on treatment so as to make no PH assumption for the treatment effect, and to still adjust for all the covariates. This yields “adjusted KM curves” and provides the “whole unadorned answer” to all reviewers. It still begs the question “which treatment is better?”. The only way to really answer that in general is to elicit patient utilities for risks at all time points, which will also provide information about how much future time discounting patients utilize.

The great cardiac surgeon John Kirklin, whom I had the great fortune to work with, when about this a different way on a one-patient-at-a-time basis. He would ask the patient “what do you want to be alive to see” when considering valvular heart surgery. If the patient said “I want to see my daughter graduate from college in 3 montths” he would advise against surgery for the present, knowing the high perioperative risk with such open heart surgery. It the patient said “I want to see my son graduate from college in 3 years” he would advise for surgery now.

Concerning logistic regression, I’d like to see a reference to understand how that is actually carried out. At first blush it seems to have no advantage to a treatment-stratified Cox model and I’m not sure it will handle all the censoring distributions that the Cox PH model can easily handle.