Is it possible to conduct phase III clinical trials in oncology if it is suspected that the survival curves will cross?

Indeed, immune checkpoint inhibitors have a delayed treatment effect that is consistently seen both in preclinical models and in clinical trials. The Breakthrough documentary very vividly describes how this whole immunotherapy field was almost shut down because traditional oncology trial designs did not take this into consideration, and how tough it was to change the statistical analysis paradigm within industry and the FDA to account for what was seen in mouse models and eventually in humans.

2 Likes

It’s part of the explanation, surely, but I’m not convinced that what you’re saying is the whole explanation. I think there are also a number of deeper problems:

  1. The most used endpoint is overall survival, and there is no consensus on what value to give to the percentage of survivors at fixed points. Thus, some authors are reluctant to use the milestone survival rate if it is not clear that it is a good surrogate of global survival, or propose to use two primary co-endpoints, being one of them the overall survival, and the other the OS rate at fixed points, spending two degrees of freedom in the analysis.
  2. However, when the milestone survival rate is evaluated at points of time farther away from the start, for example, at 2-3 years, one can speak of the percentage of long survivors. I believe that not enough thought has been given to the possibility of using this endpoint in studies, and I believe that at the present time it is rejected both by agencies and by other stake-holders, without a true critical analysis. There may be several reasons for this, one is that so far we have been concerned to increase a little the survival of patients, but for the first time immunotherapy offers the opportunity to offer long survival to a few. This advance however has not yet brought a change of mentality on the use of endpoints. If today a trial is proposed using the 2-year OS-rate as the only main objective, it will possibly be rejected, but this may or may not be substantiated.
  3. I think sponsors, and the guys who pay, are horrified to acknowledge the pure truth, that drugs do not benefit everyone equally, and even for some they are harmful, nor do they have a constant effect over time. This is so controversial that they don’t even want to see it in a protocol, even though it’s the inescapable truth. What they want is a simple measure, such as the hazard ratio, which is comforting, even if it is false in these scenarios.

The issue remains far from resolved. People still don’t grasp the importance of increasing the % of long-term survivors, and designs continue to focus on increasing the length in months of life by a small amount. The documentary seems very interesting.

Yes, it is worth watching. BTW, I believe the correct term is 2-year OS probability and not rate. Hazards are rates and can exceed 1.0, whereas the milestone events you are referring to cannot. Good discussion here.

There is a curious example in the literature, the ADAPT trial. In April, closing was announced. It fell short, they said.


The only thing that fell short was the follow-up. In September, the researchers, begging at every door, get it kept open to evaluate the long-term effect, given that 50% of the patients were still alive.

I’m not sure about the current status of this trial because I’m involved in other tumors. In clinicaltrials.org seems like closed study for lack of effectiveness NCT01582672, but I’m not sure because uptodate.com say it is still open.
https://www.uptodate.com/contents/immunotherapy-of-renal-cell-carcinoma
It would be interesting to follow this story.

1 Like

That trial is closed for enrollment but there are still patients alive who received that regimen in our department. It is a different approach than immune checkpoint therapy, and can be further developed in the future using either AGS-003 (rocapuldencel-T) or similar agents.

1 Like

And why would the OS rate at a remote point, perhaps 2-3 years or more, not have been a good primary endpoint in this study? The Cox model seems to be a bad option there…

There are limitations (previously discussed here and here) with using 2- or 3-year survival probabilities as primary endpoints. Proportional hazards, while traditionally useful in chemotherapy trials, should indeed be a questionable assumption in immunotherapy research. There are multiple factors to consider when analyzing data from such RCTs, including challenges with causal inferences. In most cases, what’s needed is good statistical modeling using reasonable assumptions. I wish it was always as easy as comparing survival probabilities at fixed time points. Overall survival inferences, for example, are affected by subsequent therapies and this can confound inferences. See here for the related concept of dynamic treatment regimes (DTRs). We are currently analyzing a DTR-based renal cell carcinoma trial where patients were randomized for both first-line and subsequent therapies and even that is not an easy task inferentially. The need for good modeling cannot be overemphasized.

In my opinion there is a nuance. In the case of lymphoma what is raised is the use of a subrogated variable, the rate of progression at 2 years, for a disease that has a much longer survival (much longer that 2 years). The goal there is the use of an intermediate endpoint, with the aim of not following all patients so long. This is controversial indeed, probably wrong. But that has absolutely nothing to do with using long-term survival rates, proportional to the realistic life expectancy for that disease. Successive therapies to the first line may indeed be a problem in some cases, but not in aggressive neoplasms with <10% of survivors at 2-3 years. It is an issue that can be weighed in indolent tumors or with multiple treatment options, but it is not always important in orphan tumors.
I do not understand why you do not agree with milestone survival rates in a remote point, as a possible valid primary endpoint. Maybe it’s because of my ignorance, but I don’t think I can conceive another endpoint more reasonable in some specific scenarios.

For metastatic/locally advanced clear cell renal cell carcinoma (the cancer treated in the ADAPT trial) 51% of patients will receive a second-line therapy and 24% will receive a 3rd line therapy based on recent real-world data. When evaluating the efficacy data of a first-line trial RCT such as ADAPT testing AGS-003 (rocapuldencel-T) for this cancer then the choice of subsequent therapies can affect overall survival and this should ideally be taken into account in the statistical model. From a causal inference perspective, if we do not use the proportional hazards assumption then there are many issues that arise and will have to be appropriately modeled with other assumptions. This paper discusses in detail some of those considerations using examples from oncology.

1 Like

Kidney cancer can be indolent, and has multiple treatment options. The Cox model may not be the best option for this same reason!
But anyway I don’t mean these tumors. I mean tumors such as gastric cancer, with little impact for the second line, and less than 10% two-year survival. Why don’t we use milestones in these settings?

1 Like

Different tumors/treatments may indeed need different assumptions and thus different models. I am apprehensive regarding milestone (landmark) comparisons as the primary endpoint for phase 3 RCTs due to their inefficiency from loss of information. Under certain conditions I guess they can be efficient enough. But more often than not we would end up shooting ourselves in the foot.

1 Like

I can tell you by heart 6 shots in the foot in clinical trials, for using Cox models in dynamic contexts, but I don’t know any about milestone survival rates, in remote points, as it is used very little.

Agreed, it is not the whole explanation.

The two main issues I see when comparing the curves at one specified time are:

  1. The choice of the appropriate time is subjective and conclusions can be different depending which time point you choose. It is also harder to compare between studies if choice of time point is not the same.
  2. Comparing the estimates at a specific time point ignores the shape of the curves. You may have similar estimates at 3 years but one group having events early while others have event late, for example. I would favour using restricted mean survival time at 3 years instead in those cases.

Also not sure what the acceptance from FDA perspective would be because pharma will only use an endpoint that the FDA agrees for submission.

I’m aware of all those risks and agree with you. The problem is that I think there are, and there will be more and more situations in which there is not going to be a better viable option, because all the alternatives are even more problematic.
For example, how do you interpret RMST with crossing survival curves…?

No single number will be able to properly describe the difference between treatments in those situations. For clinical trial endpoint for drug approval probably need to have co-primary endpoints. For patients and clinicians will be a subjective decision. If you see a clear advantage of drug A short term and clear advantage of drug B long term, the “best” treatment will the one that suits patient situation. Assuming similar toxicity, cost, etc an elderly patient may prefer drug A but a young one may prefer drug B

1 Like

This is a fascinating thread. One fairly rudimentary question/idea on my end - it seems to me that in many of the immunotherapy trials, the treated cohorts have a heterogeneity of response (as mentioned earlier in the thread) and I suspect that the composite survival curve in most of these instances fails the proportional hazards requirement for the survival models that are used most often to fit the data due to this effect heterogeneity, which is why you can have a similar median OS, but a marked difference in the height of the “tail” of the Kaplan Meier curve. I wonder if these curves should not be modeled right off the bat as a composite of two (or more) separate survival curves just like when we do Gaussian decomposition for complex distributions, for instance. I have no idea if such a framework exists in survival analysis, but it seems to me that this may model the situation more accurately. Then again, I may be completely off-base.

3 Likes

Good point and we have tackled such heterogeneities in two different ways (open to suggestions on more; also keeping in mind that appropriate covariate adjustment can at least in some cases help maintain the PH assumption):

  1. Bayesian hierarchical models with a frailty (random factor) with a multiplicative effect on the baseline hazard functions to account for unmeasured heterogeneity in responses. I guess those would be considered “semiparametric” in a sense.

  2. Use flexible nonparametric Bayesian regression models such as the one described here.

3 Likes

Using multiple endpoints is a solution proposed by several authors, but has several problems. First of all, it would be interesting to calculate the increase in sample size under realistic assumptions, using one endpoint vs. two coprimary endpoints. It is possible that the increase is not bearable for the type of differences about 15-20% that are expected to be found on occasions. If you add that the eligibility criteria of an increasing number of trials require a low-prevalent biomarker, and that late effects demand very long follow-up, it is easy for some designs to be mastodontic and not feasible. I personally don’t like complex designs, they generate a sensation of artifice. This also clashes with the perception of patients, who want any effort to be made to heal themselves, but are not willing to take risks if the differences are small. I don’t see so clearly therefore that the research team can decide this in certain scenarios without consulting what the patient really demands…

I think the Royston - Parmar spline model can do something similar to this.
However, at this point, I believe that part of the problem is that Phase III trials, driven by the logical urgency of the situation, are conducted without knowing the predictive response factors in detail beforehand. At this point a proposal that is reasonable, in my opinion, is to go back to phase II studies and try to evaluate the reasons for the heterogeneity of the response.