Background to PFS endpoint

PFS assumes that death events are randomly related to tumor progression.

The only way I can make sense of this is that they’re saying it assumes that the relationship between death and progression is the same on both arms (ie not dependent on treatment). That is, the % of deaths occurring before progression is observed won’t be systematically different between the arms. That won’t always be true (as in the contrived example I offer below), but I’m not sure it matters if we can simply agree that death is not a desirable outcome. PFS is not useful solely because it measures time to progression accurately (it may not); it is also useful because events accumulate faster than deaths, and because it is less affected by treatments delivered on progression. (There are also downsides of using a somewhat subjective endpoint in a setting where blinding is usually impossible, of course, especially when trialists and/or funders may care a great deal about the direction of the outcome.)

The rest of this post considers the biases inherent in TTP and the more general problem of censoring when there are competing events.

Death is (often) related to progression and it can happen before anyone has had a chance to observe progression. A treatment that is less successful in preventing progression/death, or which kills people before they have had a chance to progress, would end up looking better than it really is if we censor deaths.

Here’s a contrived example:

Let’s say the intervention is radical surgery and, in truth, it makes no difference at all to the risk or timing of progression. But the radical surgery kills x% on the operating table; they die on the day they were randomised. If we censor them at the time of death, both arms will do equally well according to TTP because we’re pretending the dead people never existed (or at least that their deaths had nothing to do with the treatment that killed them, which amounts to a serious but easily invisibilised violation of ITT for this endpoint).

It gets much worse if the increased risk of early death is related to (poor) prognosis. If the people who would have progressed early die early instead, the intervention responsible for the early deaths will look better according to TTP.

You’d hope, of course, that a large difference in early, treatment-related (or other pre-progression) deaths would be noticed. But small-ish differences can and do occur and it takes longer for mortality data to mature, especially if there are effective second and third-line treatments available. And, of course, the risk of publication bias increases if TTP makes a treatment look better than it really is.

In survival analysis, censoring people implies that, when censored, they were still at risk of the event at some time in the future. This is not true for dead people. The reason we count death as an event when we’re trying to measure progression is that it is both a competing event and also (like progression) a negative event. There is no need to insist that deaths and progressions are equally important (or independent of each other), only to acknowledge that dead people can’t progress and that both events are ‘bad’.

We get a similar (but trickier) problem when we try to estimate time-to-discharge. If one arm has a higher risk of dying in hospital, censoring at the time of death will make time-to-discharge look better because we’ve removed the dead people from the denominator for the ‘good’ event. A simple solution to this is to regard dead people as permanently hospitalised (that is, censor them at the end of follow-up, or on the date the data were frozen for analysis).

In-hospital follow-up creates another problem for estimating even the simplest of these outcomes: time-to-death. If people are censored on the day they are discharged (as would often happen by default with a naive survival analysis) the denominator for estimating the risk of death is reduced, making deaths look more common than they really are. If the follow-up period is sensible given the condition being treated, this should at least be apparent when the analysis concludes that virtually everybody dies. People who are discharged before the end of in-hospital follow-up need to be assumed (or verified) to be alive at the end of follow-up.

The RECOVERY trial of convalescent plasma for Covid-19 used the approaches to censoring described above (broken links because I’m only allowed two links in this post):

We used Kaplan-Meier survival curves to display cumulative mortality over the 28-day period. We used similar methods to analyse time to hospital discharge and successful cessation of invasive mechanical ventilation, with patients who died in hospital right-censored on day 29.

www.thelancet…com/pdfs/journals/lancet/PIIS0140-6736(21)00897-7.pdf

and, from the supplementary materials:

For the primary outcome [all-cause mortality], discharge alive before the relevant time period (28 days after randomisation) will be assumed as absence of the event (unless there is additional data confirming otherwise).

www.thelancet…com/cms/10.1016/S0140-6736(21)00897-7/attachment/1b4e84e3-9822-4a1f-8eba-a29ff8ee65d3/mmc1.pdf)

These methods are not perfect but they do have the advantage of being simple. Fine & Gray proposed an alternative approach: A Proportional Hazards Model for the Subdistribution of a Competing Risk. But interpretation is not simple.

Another trial of convalescent plasma for Covid-19, REMAP-CAP, took a different approach to the multistate problem. To assess organ-support-free days (OSFD) they assigned individuals to a category corresponding to the number of OSFD up to 21 days, with people who started the trial on mechanical ventilation (ie organ support) being assigned to a category labelled ‘0’ and people who died at any time during follow-up being assigned the label ‘-1’; people who remained free of organ support for more than 21 days were assigned ‘22’. These ordered category labels (which happen to look like numbers) were then used in a bayesian cumulative logistic model. This has some merits for comparing the two groups across multiple different states but the resulting OR (and medians) are hard to interpret.

Another broken link to that trial report: (jamanetwork…com/journals/jama/articlepdf/2784914/jama_estcourt_2021_oi_210114_1635806538.94872.pdf).

This is a useful review which covers the ground above, and more: Practical Recommendations on Quantifying and Interpreting Treatment Effects in the Presence of Terminal Competing Risks: A Review.

Bit long but I hope it’s useful. It’s a very interesting area and the FDA guidance far too terse.

5 Likes