Progression in cancer trials

I love this whole discussion, which solidifies my belief that analysis should stay as close to raw data as possible in almost all circumstances.

1 Like

I don’t have much to add to the interesting discussion evolving here, just pointing to an earlier discussion of why we generally use progression-free survival instead of time to progression. It’s not making any assumptions about the cause of death, only that death is also a bad outcome (which can bias assessments of time-to-progression).

1 Like

I’m a pharmacometrician and I have worked with the raw RECIST data in a number of my studies. I love this discussion. Here are some big challenges.

What we care most about is overall survival and quality of life. I’ll focus on overall survival, since that’s the most straightforward to measure. Often, progression free survival and the RECIST variables are either not good surrogates for overall survival, or we just don’t know whether they are good surrogates or not. Furthermore, in many cases, the therapy the patient receives post-progression greatly impacts overall survival, but this data is usually not collected. Buyse et al [1] cover these issue in their recent review, and I quote below.

“Even in the best-case scenario where a meta-analysis of randomized trials addressing a specific therapeutic question can be conducted to test trial-level surrogacy, the results may not apply in a future trial testing a different question, for instance, the effects of a new drug with a novel mechanism of action, since the direct and indirect effects of such a drug on survival may be substantially different than with historical drugs. The increasing availability of active treatments after observation of the surrogate may also negatively impact trial-level surrogacy. For example, in patients with advanced colorectal cancer, PFS was a reasonable surrogate for survival when fluorouracil-based therapies were the only available second-line treatments: the trial-level coefficient of determination estimated from 10 randomized trials conducted in 1744 patients was R2 = 0.98 (95% confidence interval [CI]: 0.88 to 1.0011). In contrast, a more recent meta-analysis of 22 trials conducted in 16,762 patients found a much lower trial- level coefficient of determination R2 =0.46 (95% CI: 0.24 to 0.6812). Note that the confidence intervals around R2 can be wide, which implies that substantial uncertainty will typically affect predictions based on surrogates.”

Another issue to keep in mind is that it’s the appearance of a new lesion that is often the most predictive of overall survival (see some of our work for example [2,3]) but unfortunately this is a binary endpoint with limited power. A joint analysis of all the data as @f2harrell suggests (including vitals, etc.) is appealing, but then to use this model in a predictive fashion, it’s important to understand how to these coefficients vary across indication and over time as the therapeutic landscape changes. I believe addressing these questions requires a collaborative effort pooling a lot of data. The FDA would be well positioned to explore these issues, and they scratched the surface in 2009 [4], though much has changed since then. Perhaps with databases like Project Data Sphere [5] it would be possible today to better understand this issue. It’s something I continue to explore in my own research and I’d be happy to discuss further. Thank you for raising this interesting topic!

  1. Buyse, Marc, et al. “Surrogacy beyond prognosis: the importance of “trial-level” surrogacy.” The Oncologist 27.4 (2022): 266-271.

  2. Stein, Andrew, et al. “Survival prediction in everolimus-treated patients with metastatic renal cell carcinoma incorporating tumor burden response in the RECORD-1 trial.” European urology 64.6 (2013): 994-1002.

  3. Mietlowski, William Leonard, et al. “Clinical importance of including new and nontarget lesion assessment of disease progression (PD) to predict overall survival (OS): Implications for randomized phase II study design.” (2012): 2543-2543. Publications/Mietlowski12_ASCOposter_TargNontargNew_OS.pdf at main · iamstein/Publications · GitHub

  4. Wang, Y., et al. “Elucidation of relationship between tumor size and survival in non‐small‐cell lung cancer patients can aid early decision making in clinical drug development.” Clinical Pharmacology & Therapeutics 86.2 (2009): 167-174.

  5. Project Data Sphere https://www.projectdatasphere.org/

4 Likes

The Buyse et al article is a major step in the right direction. Notice for example how nicely they use graphs to express their assumptions about the data generating processes connecting these endpoints. However, the field still needs to evolve into deeper thinking of what these endpoints estimate and how they connect with the random treatment allocation procedure in RCTs. This can have tremendous implications in oncology and beyond. Various stakeholders are gradually beginning to recognize this and there will be more discussions in the coming years.

Very nice work on the everolimus RCC cohort, and it is definitely related to the original post in this thread. Your team should keep building on these ideas!

2 Likes

Interesting discussion and don’t forget those non-target lesions that are recorded qualitatively. I’m keen on trying out the multi-state modelling approach - state for new lesion and then different states for the non-target leisons and then using tumour size from target lesions as a time-dependent covariate. It woudl be nice to just stick with the raw data as is and not process it in any way, In a lot of my studies death is usually a rare progression event but clearly it can be a state in the multi-state model. Where I start getting heart palpations is when we start correlating to overall survival…

There is a plethors of post-progression treatments patients get and nobody knows how the disease you have created ina previous line with a new drug is going to react to post-progression treatments you know even less about. I recall a Genentech study for Atelizumab in 2nd line NSCLC where Docetaxel was the comparator. They docuemnted the post-progression treatments and it was incredible what people got in the two arms. Interestingly it was possible to get Dcetaxel twice - which I was never aware of. So I shy away from OS becuase of these issues.

2 Likes

Those who get palpitations when thinking about overall survival estimates truly understand the problem. Ignorance is bliss.

2 Likes

There is a lot to explore here, location of lesions etc. and one I’ve wondered about… say you have a lesion in the liver - we collect liver enzymes longitudinally - so are they providing a signal as to how the liver lesions are doing between imaging visits?

I’m super keen to see your new methods paper on handling all the different imaging aspects.

2 Likes

Yup, although the liver metastases would have to be very extensive for liver enzymes to start become elevated, at which point often patients do not have many more options and decision-making paths are constrained. But other non-imaging variables measured longitudinally in-between images and/or concurrently with the images that could plausibly influence outcome heterogeneity and thus be included as predictors along with the raw tumor burden data could be: 1) other blood-based markers like cancer serum markers such as CA-125, circulating tumor cells, circulating tumor DNA etc; 2) quality of life / patient reported outcomes.

2 Likes

Building on Dr Msaouel’s response about utilities, a health economist’s perspective is that utilities should be defined according to the perspective of the analysis; not prospective/retrospective as defined in epidemiology; but societal, health systems, patient, or other perspective of analysis, defined according to whose preferences health outcomes reflect. Economic evaluation textbooks provide discussion. Both Drummond et al. and Neumann et al. indicate that preferences from the general public are usually used in analysis, providing contrasting rationale that authors have used to defend general public versus patient-elicited values. Neumann et al. recommend those from the general public, recognizing situations where patient preferences are appropriate.

The Global Burden of Disease (GBD) Study is an example of the logistical feasibility versus theoretical challenges of measuring patient or population perspective utility, to the extent that disability and utility are similar concepts. Airoldi and colleagues show why disability and utility aren’t simply converse numbers, but that is less relevant to this point. Anand and Hansen challenged the process of using experts to define utilities for early versions of GBD estimates, which led to revised data collection methods described by Salomon et al. for subsequent versions. I don’t know what the causal factors were behind the change but think that decisions about what source to elicit utility data from should consider perspective, feasibility, biases, and study aim; recognizing that societal perspective is still advocated where possible.

-Drummond MF, Sculpher MJ, Claxton K, Stoddart GL, Torrance GW. Methods for the economic evaluation of health care programmes. 4th ed. Oxford: Oxford University Press; 2015.
-Neumann PJ, Sanders GD, Russell LB, Siegel JE, Ganiats TG. Cost-effectiveness in health and medicine. 2nd ed. New York: Oxford University Press; 2017.
-Airoldi M. Gains in QALYs vs DALYs averted: the troubling implications of using residual life expectancy. London: London School of Economics; 2007. Report No.: 0853280010.
-Airoldi M, Morton A. Adjusting life for quality or disability: stylistic difference or substantial dispute? Health Econ. 2009;18(11):1237-47.
-Anand S, Hanson K. Disability-adjusted life years: a critical review. J Health Econ. 1997;16(6):685-702.
-Salomon JA, Vos T, Hogan DR, Gagnon M, Naghavi M, Mokdad A, et al. Common values in assessing health outcomes from disease and injury: disability weights measurement study for the Global Burden of Disease Study 2010. Lancet. 2012;380(9859):2129-43.

2 Likes

I have a somewhat overlapping question which regards the prediction of pathological response (as per RECIST) - it may also be related to @Pavlos_Msaouel thread here.

Suppose you demonstrate that a biomarker accurately predicts pathological response to cancer treatment in a prospective cohort study.

How do you know if this biomarker is prognostic or predictive?

In other words, how do you tell apart prognosis from (relative) treatment effect?

Any reason to see the prediction of pathological responses “closer” to the prediction of treatment effects?

The resulting paradox is:

(a) if predicting response is mostly predicting (relative) treatment effect, then patients with a high probability of response should undergo the proposed therapy.

(b) if predicting response is mostly predicting prognosis, then patients with a low probability of response should undergo the proposed therapy. (worse prognosis → higher absolute effects)

Perhaps it’s just language, but I can’t help feeling (b) doesn’t make much sense. I think an ideal workflow would include testing an interaction term for relative efficacy in an RCT. In practice, it feels like biomarkers predictive of response found in observational studies are then tested in animal models to get a better idea about relative treatment effects – so the factor separating (a) and (b) would be biological understanding.

2 Likes

Good question. Our recommendation here, motivated by decades of literature on the topic, is to flip the narrative and define “prognostic” and “predictive” effects based on underlying causal considerations. This is how we nowadays model our analyses. The purely “statistical” definition of multiplicative interactions on the linear scale as “predictive” is a red herring and often produces noise.

2 Likes

Thanks! That makes much more sense. I’m halfway through the Cancer paper (I was just reading your related EUO editorial) and might come back with thoughts and questions if you don’t mind. I particularly like the way you framed the relationship between biological knowledge and treatment effect transportability. My biased perspective is that this is one of the important messages to improve the translation of omics/high-D research to clinically useful information.

1 Like

Exactly. This is a major motivation why we started framing things in these ways.

More than happy to discuss with anyone who has the fortitude to read that monster Cancers paper. There is another similarly long one coming up hopefully after this grant writing season ends. Never say never but I do not plan to write again such long manuscripts for the foreseeable future.

1 Like

We should have insisted on accurate terminology when prognostic and predictive were first used in this context. Predictive should not be used to pertain to effect modification in my view. It’s definition goes much earlier in time. I like the term differential treatment effect.

A side note: We should not speak about “_predictive” unless using data from a well-sized clinical trial. Observational data are seldom capable of providing an accurate average treatment effect, much less a differential effect.

3 Likes

Completely agree. It’s a horrible terminology that we are unfortunately stuck with for now.

Indeed, interventional data are essential. However, in the story of the HER2 “predictive” biomarker we describe in section 3.4 of the Cancers review above, important information was also derived from the “observational” correlative data.

2 Likes

I love that you brought up Buyse’s work here.

If mathematicians can rework the very foundations of the tower of mathematics, I think oncologists can at least rectify unhelpful language of this sort.

1 Like

Along with curing cancer. A gargantuan task to put it mildly. I’ll thus give my colleagues a temporary pass on this terminology :wink:

1 Like

I work largely in paediatric cancer trials, where sample sizes are almost always constrained. Yet outcomes used are very much the standard ones used in most cancer trials - event-free survival times, binary tumour responses, and the abomination of “best response” over a treatment period. In other words, we’re discarding loads of information in a setting where we should be using it as efficiently as we can.

I’ve been going on about this to my colleagues for some time (with limited progress). I really like the idea of using unified longitudinal ordinal outcomes (and Bayesian methods, naturally), but my question is what might an outcome look like? I imagine death would be the worst state, alive and cancer-free the best state, with others in between that might vary depending on the nature of the trial. For example, metastasis, recurrence (if treatment intended to cure), progression (though that’s vague and hard to define and detect), further treatment (e.g. surgery), organ preservation in some cases.

I guess one down side is that outcomes would be much more trial-specific and wouldn’t be common across many trials. Not sure that’s actually a benefit, to be honest, though.

4 Likes

Focusing only on efficacy for simplicity (no toxicity/quality of life): Death is the worst outcome and then different levels of tumor burden (as measured radiologically and then converted to standard low resolution endpoints such as response rate and progression-free survival) in ordinal scale adjusted for baseline.

Simple, feasible, and uses information we already have available.

1 Like