Is it possible to conduct phase III clinical trials in oncology if it is suspected that the survival curves will cross?

For metastatic/locally advanced clear cell renal cell carcinoma (the cancer treated in the ADAPT trial) 51% of patients will receive a second-line therapy and 24% will receive a 3rd line therapy based on recent real-world data. When evaluating the efficacy data of a first-line trial RCT such as ADAPT testing AGS-003 (rocapuldencel-T) for this cancer then the choice of subsequent therapies can affect overall survival and this should ideally be taken into account in the statistical model. From a causal inference perspective, if we do not use the proportional hazards assumption then there are many issues that arise and will have to be appropriately modeled with other assumptions. This paper discusses in detail some of those considerations using examples from oncology.

1 Like

Kidney cancer can be indolent, and has multiple treatment options. The Cox model may not be the best option for this same reason!
But anyway I donā€™t mean these tumors. I mean tumors such as gastric cancer, with little impact for the second line, and less than 10% two-year survival. Why donā€™t we use milestones in these settings?

1 Like

Different tumors/treatments may indeed need different assumptions and thus different models. I am apprehensive regarding milestone (landmark) comparisons as the primary endpoint for phase 3 RCTs due to their inefficiency from loss of information. Under certain conditions I guess they can be efficient enough. But more often than not we would end up shooting ourselves in the foot.

1 Like

I can tell you by heart 6 shots in the foot in clinical trials, for using Cox models in dynamic contexts, but I donā€™t know any about milestone survival rates, in remote points, as it is used very little.

Agreed, it is not the whole explanation.

The two main issues I see when comparing the curves at one specified time are:

  1. The choice of the appropriate time is subjective and conclusions can be different depending which time point you choose. It is also harder to compare between studies if choice of time point is not the same.
  2. Comparing the estimates at a specific time point ignores the shape of the curves. You may have similar estimates at 3 years but one group having events early while others have event late, for example. I would favour using restricted mean survival time at 3 years instead in those cases.

Also not sure what the acceptance from FDA perspective would be because pharma will only use an endpoint that the FDA agrees for submission.

Iā€™m aware of all those risks and agree with you. The problem is that I think there are, and there will be more and more situations in which there is not going to be a better viable option, because all the alternatives are even more problematic.
For example, how do you interpret RMST with crossing survival curvesā€¦?

No single number will be able to properly describe the difference between treatments in those situations. For clinical trial endpoint for drug approval probably need to have co-primary endpoints. For patients and clinicians will be a subjective decision. If you see a clear advantage of drug A short term and clear advantage of drug B long term, the ā€œbestā€ treatment will the one that suits patient situation. Assuming similar toxicity, cost, etc an elderly patient may prefer drug A but a young one may prefer drug B

1 Like

This is a fascinating thread. One fairly rudimentary question/idea on my end - it seems to me that in many of the immunotherapy trials, the treated cohorts have a heterogeneity of response (as mentioned earlier in the thread) and I suspect that the composite survival curve in most of these instances fails the proportional hazards requirement for the survival models that are used most often to fit the data due to this effect heterogeneity, which is why you can have a similar median OS, but a marked difference in the height of the ā€œtailā€ of the Kaplan Meier curve. I wonder if these curves should not be modeled right off the bat as a composite of two (or more) separate survival curves just like when we do Gaussian decomposition for complex distributions, for instance. I have no idea if such a framework exists in survival analysis, but it seems to me that this may model the situation more accurately. Then again, I may be completely off-base.

3 Likes

Good point and we have tackled such heterogeneities in two different ways (open to suggestions on more; also keeping in mind that appropriate covariate adjustment can at least in some cases help maintain the PH assumption):

  1. Bayesian hierarchical models with a frailty (random factor) with a multiplicative effect on the baseline hazard functions to account for unmeasured heterogeneity in responses. I guess those would be considered ā€œsemiparametricā€ in a sense.

  2. Use flexible nonparametric Bayesian regression models such as the one described here.

3 Likes

Using multiple endpoints is a solution proposed by several authors, but has several problems. First of all, it would be interesting to calculate the increase in sample size under realistic assumptions, using one endpoint vs. two coprimary endpoints. It is possible that the increase is not bearable for the type of differences about 15-20% that are expected to be found on occasions. If you add that the eligibility criteria of an increasing number of trials require a low-prevalent biomarker, and that late effects demand very long follow-up, it is easy for some designs to be mastodontic and not feasible. I personally donā€™t like complex designs, they generate a sensation of artifice. This also clashes with the perception of patients, who want any effort to be made to heal themselves, but are not willing to take risks if the differences are small. I donā€™t see so clearly therefore that the research team can decide this in certain scenarios without consulting what the patient really demandsā€¦

I think the Royston - Parmar spline model can do something similar to this.
However, at this point, I believe that part of the problem is that Phase III trials, driven by the logical urgency of the situation, are conducted without knowing the predictive response factors in detail beforehand. At this point a proposal that is reasonable, in my opinion, is to go back to phase II studies and try to evaluate the reasons for the heterogeneity of the response.