Exemplary QOL analyses that avoid change-from-baseline blunders?

karlamoPA · December 6, 2019, 4:10pm

This can be said for any endpoint (PFS, OS, response) right?
To clarify my advocacy:

PRO instruments standardized to compare fatigue, physical pain,… overall QoL across studies
(so the subjective experiences is given directly by patient (apart from stressful consults)
not misinterpreted, or under-reported by study doctors)
- in future: included in all studies with ePRO
  to improve safety (timely reporting of changes that require supportive or ER care)
  a study to be referenced later showed improved OS associated with this practice.
As an required consideration in studies using surrogates (PFS) as primary endpoint.
Especially relevant to RCT tests of continuous use and maintenance protocols
(Is potential gain in PFS (often modest) worth the additional toxicity experienced on a daily basis?)
Not typically to be used in isolation as the basis for approval.
Cutoffs for clinically significant differences in PRO scores.
… to inform regulatory and clinical decision-making.
Reporting of PROs should standardized - done in a way that fosters patient understanding as a decision aid. Example: Do I want to pay for an expensive treatment (in some cases bankrupting my family) to possibly gain x months, when the treatment is unlikely to improve (or impairs) my QoL on a daily basis?

f2harrell · December 6, 2019, 4:14pm

Based on my understanding, all of those things apply, and none of them are satisfactorily addressed by computing change from baseline. Gain in Y needs to come from comparing Y in treatment A patients to Y in treatment B patients, adjusted for baseline to gain power/precision.

Pavlos_Msaouel · December 7, 2019, 3:29am

@davidcnorrismd, I felt that the ARAMIS trial did a great job analyzing QOL outcomes.

f2harrell · December 7, 2019, 1:01pm

They did this:

The secondary end points were overall survival, time to pain progression (defined as either an increase of ≥2 points from baseline in the score assessed with the BPI-SF questionnaire or initiation of opioid treatment for cancer pain, whichever occurred first)

This is very problematic, although challenging to deal with optimally when the score is combined with other endpoints.

For quality-of-life variables, an analysis of covariance model was used to compare the time-adjusted area under the curve (AUC) between groups, with covariates for baseline scores and randomization stratification factors. The least-squares mean and 95% confidence interval was estimated for each group and for the difference between the groups.

The first part of this is good, if change from baseline was not computed. But to assume Gaussian residuals in a regular model for this kind of response variable is tenuous. The variable calls for an ordinal regression.

karlamoPA · December 17, 2019, 12:10am

Copying from the protocol.

I wonder if anyone can comment on:

the 16 week assessment schedule for QoL input? (seems arbitrary? bias that may result?
cutoff for significance: 10 points in FACT-P total score compared with baseline.
weakness/strength of using ePROs providing near real time reporting

6.3.3.6 Health-related quality of life
The mean of the screening and day 1 (pre-treatment) values will serve as baseline for QoL.

6.3.3.6.1 FACT-P
QoL will be assessed using a disease-specific FACT-P questionnaire completed by the patient
(Appendix 5a). FACT-P will be assessed at screening, day 1, week 16 and at the end-of-study
treatment visit during the study treatment period.

Patients will be defined as having total QoL deterioration, if they experience a decrease
of  10 points in FACT-P total score compared with baseline.

6.3.3.6.2 PCS subscale of FACT-P
QoL will be assessed using prostate cancer-specific subscale of the FACT-P questionnaire
(PCS subscale of FACT-P) completed by the patient (Appendix 5b).
PCS will be assessed every 16 weeks until the end of the follow-up period.

Patients will be defined as having QoL deterioration, if they experience a change of  3 points
in PCS compared with baseline.

6.3.3.6.3 EQ-5D
QoL will also be assessed using a generic EQ-5D questionnaire completed by the patient.
Mobility, self-care, usual activities, pain/discomfort, and anxiety/depression are each assessed
on 3-point categorical scales ranging from “no problem” to “severe problem” (Appendix 6)

EQ-5D will be assessed at screening, day 1 and every 16 weeks until the end of the follow-up
period.

Patients will be considered to have deterioration in overall QoL, if they experience a
deterioration of  0.06 points compared with baseline at 2 consecutive assessments.

davidcnorrismd · December 17, 2019, 1:43pm

Further to Frank’s criticism of the change-from-baseline flaw in this trial, we see the additional problem that interpretation of the trial relies on classification (disguised via the word “defined”) according to this change. This implicates the classification-vs-prediction distinction that I consider the single most revelatory thing I have learned from Frank. (It’s one of those things that becomes so much a part of your thinking that you need to go back in memory to the moment you first encountered it, to remember just how nontrivial it is!) Here is a tweet of his, underscoring the vital importance of this distinction for person-centered health care:

karlamoPA · December 17, 2019, 1:58pm

My layman’s understanding - feel free to correct me: risk prediction generally involves biomarkers or other characteristics that point to an increased probability of adverse or beneficial outcomes. Indeed this is valuable (the heart of personalized medicine) but also illusive, right? What predicts a bad outcome requiring large correlative studies and prospective validation. When validated they can be integral to eligibility of a given study - you can’t do the study without the assessment of the biomarker.

Here I take the definition of QoL deterioration as a classification of an outcome based on the net change in participant status - a way to measure good or bad results from taking the study drug.

f2harrell · December 17, 2019, 4:02pm

To the basic questions, using change from baseline is a disaster, then dichotomizing (“responder analysis”) that is a double disaster. Misleading, arbitrary, means different things to different patients, power loss, … Details are here.

karlamoPA · December 17, 2019, 11:58pm

I was calling attention to a definition of QoL deterioration used in the study cited by a member here as a good example of QoL / PROs used in a study.

My goal is to identify models for RCT that include PROs as adjuncts to the commonly used primary surrogate endpoints (PFS). This so the outcomes (only patients can report) such as fatigue and pain which determine how well people live can be captured more objectively and accurately and considered with greater confidence in regulatory assessments.

To be clear: I am not advancing a method of analysis for PROs.

Some investigators are asserting that gains in PFS from continuous use (and maintenance) protocols also improves QoL. Since PFS gains so often do not lead to improved OS, I feel this assertion needs to be tested.

Added 12.18 Responses I’m thankful for, but I don’t have the background to understand them adequately. The message seems to be:

QoL domains (PROs for pain and suffering) cannot be reliably compared within a study comparing radio-graphic endpoints - sufficient to justify inclusion as a study endpoint. Study size sufficient for PFS comparisons are insufficient to compare the impact of continuous treatment needed to maintain remission on QoL.

I haven’t learned yet if sampling methods may be helpful to the stated goal - if real time capture of QoL domains with ePRO instruments can mitigate the concerns cited. Unlike radiographic tumor response, pain and suffering may be attributed to social or financial factors which PRO instruments may not account for (although they can and should).

davidcnorrismd · December 20, 2019, 1:27pm

Toward the end of the “Details are here” link in Frank’s brief post above is this example of prognostic counseling that could well be based on QoL / PRO data of the kind you hope to collect, Karl:

What is an example of the most useful prognostication to convey to a patient? “Patients such as yourself who are at disability level 5 on our 10-point scale tend to be at disability level 2 after physical rehabilitation. Here are the likelihoods of all levels of disability for patients starting at level 4: (show a histogram with 10 bars).”

Notice how the conversation is rooted in (statisticians would say, “conditioned on”) the patient’s current situation. Also, see the difference (-3) in disability levels never enters the discussion; the focus is placed on health states and their meaning. (Consider the modified Rankin scale as a scoring system with meaningfully defined health states, but for which score differences are meaningless.) Finally, uncertainty about the final outcome is conveyed with a histogram.

I wonder if starting from a regulatory standpoint has obstructed your view of the issue, in something like the way that regarding drug dependence as a law-enforcement (as opposed to health-care) problem does. Why not start with the desired prognostic counseling script, and work backward to the trial designs that will support that script?

karlamoPA · December 20, 2019, 5:24pm

sorry, do not know what you mean here: desired prognostic counseling script?

I desire independent (having no COIs, evidence-based) regulators to include the totality of effects when deciding if a treatment based on a surrogate meets standard for marketing approval - provides poor, good, excellent evidence that I will live longer or better than some other treatment. I don’t want this left to doctors based on poorly informed opinions and conflict of interest (ASP portion of full price of drug and who-knows-what other relationship with drug companies. I’m for raising standards for surrogate-based approvals which often do not improve how long people live , which adds financial stress to the side effects of the treatments. Leaving this to MD interpretation invites poorly designed studies, errors, a return to medicine by sales pitch – a return to the Wild West.

Assumption: issues with the fidelity of change from the baseline and regression to mean (which I poorly understand) will be balanced out by random allocation to study arms even if individual signals are suspect. Marginal difference in changes from baseline can be interpreted as not significant.

davidcnorrismd · January 6, 2020, 5:16pm

I’ve come across this FDA approval, which employed several of the objectionable methods including change-from-baseline and percent change. The comparison nevertheless seems persuasive, in light of opposite signs of changes. Presumably, more efficient use of the data would have enabled a smaller trial?

Deisseroth A, Kaminskas E, Grillo J, et al. U.S. Food and Drug Administration approval: ruxolitinib for the treatment of patients with intermediate and high-risk myelofibrosis. Clin Cancer Res. 2012;18(12):3212-3217. doi:10.1158/1078-0432.CCR-12-0653 PMID 22544377

f2harrell · January 6, 2020, 7:21pm

Yes, that approach loses power in addition to losing interpretation. FDA has been far too lenient in the use of change from baseline.

karlamoPA · January 9, 2020, 2:43pm

Thank you, David. Based on a sampling of studies including QoL PROs in the Results published to ClinicalTrials.gov, the change from baseline approach seems the standard method for randomized controlled trials.

karlamoPA · January 9, 2020, 4:30pm

There may be a need to categorize QoL PRO objectives in order to develop the most appropriate study design. In this study the PROs are clearly related to the disease status and serves as a response to treatment surrogate.

For systemic cancers areas of presentation vary widely and therefore the level of pain (if any pain exists yet from the disease) cannot be organ-specific for that QoL PRO.

Might eligibility for organ-specific PRO-centered studies need to pre-specify the discomfort level as an eligibility criteria – such as for bone pain in cancers that cause that symptom?

My area of concern is based on personal experience with systemic indolent lymphoma, which does not typically lend itself to organ-specific PRO outcomes and can be asymptomatic before and after induction therapy. Here a controversial standard approach (widely adopted but not yet showing a QoL or Survival benefit) is maintenance Rituxan after Rituxan-based chemo. Another controversial approache is lenalidomide as maintenance - again, without evidence that patients live longer or better. Here the PRO question is does the long term use of an agent that extends the duration of the tumor response worth the side effects of the drug? Here the disease does not necessarily cause disease symptoms at relapse – it often does not - which is why watchful waiting is another standard way to manage the disease. So I anticipate that a purposeful QoL PRO assessment would have to include broad domains such as levels of fatigue, pain, anxiety. I appreciate that such study might require prohibitory large study size and therefore the findings could not be a primary study question unless very large. As secondary endpoints, however, there may be signals sufficient to guide regulatory decisions – such as is the gain in duration of response sufficient (as a surrogate not proven to extend survival) to grant accelerated or full approval? (eg. modest PFS gain, with signals of impairment of QoL for the 1 or 2 year course of treatment) The findings would also guide clinical decisions in usual care should it win approval.

PS I love the design of this platform!!

f2harrell · January 9, 2020, 6:59pm

Categorization hurts instead of helps.

davidcnorrismd · January 10, 2020, 10:11pm

I believe Karl refers to categorizing the aims to be achieved by PROs in trials (or the characteristics of patients’ experiences targeted), not categorizing the measures themselves. For example, Karl contrasts organ-specific symptoms of some cancers as against the more diffuse symptomatology that might characterize the experience of (e.g.) an indolent lymphoma.

karlamoPA · March 10, 2022, 12:27pm

So our Citizen Petition has been submitted to the FDA on need and rationale for
supplementary comparison of Quality of Life-related patient reported outcome (QoL-PROs)* for surrogate endpoints based on tumor assessments.

full text: http://www.lymphomation.org/Citizen%20Petition-QoL-feb2022-fin.pdf

We waited for Congress to approve a commissioner and avoided advancing methods (thanks to guidance provide here) for capturing, comparing, or reporting QoL-PROs.

Opening statement:

To amend the Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics Guidance for the Industry to require or strongly urge supplementary comparison of Quality of Life-related patient reported outcome (QoL-PROs) for the following surrogate endpoints in randomized controlled clinical trials:*

Endpoints Based on Tumor Assessments

Disease-Free Survival (and Event-Free Survival) | Objective Response Rate
Complete Response | Time to Progression and Progression-Free Survival (PFS) | Time to Treatment Failure

What I hope to see is a universal set of QoL-PRO in key domains applied across all studies and reported in a way that promotes physician and patient understanding.
We anticipate this change in industry registration trials would provide a good many benefits (see text of petition) and would be an aid to regulatory decision making in close calls:

3 scenarios:
Major improvement in time to relapse with modest impairment in QoL.
(Approval could still be justified)

Modest improvement in time to relapse with impairment of QoL
(Longer followup justified)

Modest improvement in tumor response with improvement in QoL
(Approval could still be justified)

Suggestions on models, instruments, analysis, and reporting in plain language most welcome!

f2harrell · March 10, 2022, 12:38pm

I’m glad you are pushing this. A key analytical challenge is that you can’t really separate QOL from mortality, since mortality blocks observation of QOL. So I don’t recommend separation of clinical and QOL outcomes but rather a single comprehensive ordinal analysis. Ordinal longitudinal analysis as motivated here provides clear estimands. For example one can compute as a function of time and treatment the probabiity that a patient has QOL level x or worse, where “or worse” includes both a worse QOL scale, recurrence, or death. From the same model one can estimate P(recurrence or death by time t), P(death by time t), P(recurrence but alive at time t). Extensive material related to this may be found at https:/hbiostat.org/proj/covid19.

karlamoPA · March 10, 2022, 8:38pm

Thank you, Dr. Harrell!. (Pardon my typo!) If the FDA acts on this (no expectations) I assume that they will convene experts and advocates to select the appropriate instrument(s) and analytics. My part is done!

Having said that I am surprised that mortality would be a problematic issue in trials that use surrogate endpoints as the primary endpoint - my impression being that surrogates are used because differences in survival are not anticipated during the course of the study and the defined followup.