Quality of remdesivir trials

The paper used an invalid test of proportional odds. This was demonstrated by my former PhD student Bercedis Peterson when she showed that the score test of PO is terribly anti-conservative. SAS implemented this test even though it was known to be inappropriate at the time. This was in 1990.

I’m glad you referred to @Stephen about PO possibly not mattering anyway. I showed here that it really doesn’t matter until one wants to estimate probabilities of individual event components.

Paul I’m having trouble getting my head around the feeling that the ordinal scale was not clinically interpretable. Can you please be specific in how it’s not interpretable?

I wonder if the point about clinical interpretation relates back to the following quote from the JAMA paper about unknown clinical importance for the statistically significant data:

Meaning Hospitalized patients with moderate COVID-19 randomized to a 5-day course of remdesivir had a statistically significantly better clinical status compared with those randomized to standard care at 11 days after initiation of treatment, but the difference was of uncertain clinical importance.

Source: https://jamanetwork.com/journals/jama/fullarticle/2769871#:~:text=Importance%20Remdesivir%20demonstrated%20clinical%20benefit,with%20moderate%20disease%20is%20unknown.

I always found that quote to be really interesting. I think it begs the following questions: are we trying to get statistically significant data; and/or are we trying to improve clinical outcomes?

1 Like

I see no justification in the paper for that statement at all so am still awaiting particulars. But here is an example where the simplest outcome is difficult to interpret clinically: Suppose that a therapeutic target is to get patients to “respond” to treatment using some arbitrary dichotomization that clinicians pretend to understand. Suppose that for treatment A 0.34 of patients respond and for treatment B 0.42 of patients respond. Is the difference between 0.34 and 0.42 clinically significant?

i can’t help but be pragmatic, ie if the clinicians are saying in a review of remdesivir trials (“Efficacy of Remdesivir in COVID-19”) that it’s difficult to translate the scale “into a clinically meaningful statement for patients, clinicians, and policy makers" then i regret the wasted cost on a trial that had little chance of producing a result that could be readily accepted/communicated and instead incites debate (eg statnews). When you have these multi component scales it’s hard to sell, ie what is driving the result. I saw vinay prasad on twitter rebuke pfs (tweet) and i just feel, in an analogous way, these measures are susceptible to scepticism. In a old stat medicine paper (i can’t remember the author) they refer to an RCT as a gladitorial contest with a single victor. Composite measures will tend to push us in this unfortunate direction with fixtaion on a single p-value: “did the drug win?” That’s my worry

edit: this is a more relevant paper where clinical people regret these ordinal scales in covid studies: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7265861/: “Our analysis shows that most trials testing new treatment options for SARS-CoV-2 include a surrogate measure which may or may not predict clinical benefit.” etc

i enjoyed reading it and should reread it, but isnt it ironic that you quote senn because he has been critical of that measure: “Random sampling does not take place in clinical trials and overlap measures make most sense in a sampling context” (senn 2011)?

I note that the McCreary & Angus editorial you cite contains a major misunderstanding of ordinal outcome scales. Such scales simply do not assume equal severity spacing across levels of the scale.

I’m not seeing how that context relates to ours.

These are very physician-centric views of the world IMHO, as if what matters to patients is not relevant. It also assumes that a worsening condition is not sufficiently predictive of a tendency to have clinical events. On the first point we know from the international survey of thousands of patients that they place great weight on shortness of breath.

The progression-free survival issue is a good one to bring up, but there are two differences: (1) we have a lot of data showing how reducing progression does not lead to improved survival, for many cancers/treatments; and (2) in a chronic disease requiring long-duration expensive treatment and long-term follow-up patients have different utility functions. In many cases patients have elected to sacrifice quality of life voluntarily and have put their emphasis on life extension.

To keep the good discussion going, here is way of stating it that is a slight exaggeration: You would rather spend the time and resources to do a 7,000 patient COVID-19 clinical trial on a mortality endpoint than do a 700 patient trial on a full-spectrum ordinal endpoint. You would wait for 7,000 patients to jettison ineffective treatments instead of stopping early for futility with fewer than 700 patients with a multi-level ordinal endpoint just because it takes a bit more time to interpret.

And what is so hard to interpret about an ordinal outcome? SImplifying to 5 outcomes levels (at home with no shortness of breath, at home with significant shortness of breath, hospitalized, invasive ventilation, death), the interpretation of the trial can be stated these ways simultaneously:

  • The estimated probability that a randomly chosen patient given treatment B has a better clinical outcome than a randomly chosen patient on treatment A is 0.7
  • The estimated probabilities for treatments A and B of the patient having significant shortness of breath or worse are x.xx and x.xx
  • The probabilities of the patient needing hospitalization or worse are x.xx an x.xx
  • The probabilities of needing invasive ventilation or dying are x.xx and x.xx
  • The probabilities of death are x.xx and x.xx
  • The Bayesian posterior probability that treatment B affects mortality differently than it affects shortness of breath, hospitalization, or need for a ventilator is x.xx
4 Likes

maybe i’m not reading it properly [throughout i am reading concordance prob = IEP (the term senn uses)] When you show “there is an almost one-to-one relationship between the odds ratio (anti-log of) and the concordance probability” in the presence of non-PO we should be reassured by this because we are familiar with and at ease with the concordance prob? although senn’s SBR paper is more ambivalent than his original response to acion’s stat med paper it still elicits pause. For the jama example above you’d recommend reporting the OR?

Yes and I’d report the concordance probability P(B > A) which is a simple function of the OR even when proportional odds doesn’t hold - see here .

2 Likes

I do not think this has been very rigorously evaluated in most cases. Very often what happens in oncology trials is that a “statistically significant” progression-free survival benefit is found whereas “p > 0.05” for overall survival benefit (keep in mind that among other things the overall survival endpoint naturally has lower number of events). This is mistakenly interpreted as “improving progression-free survival but not overall survival”.

That is the predictable outcome because the trials typically don’t have Bayesian or frequentist power for a mortality comparison. But to me the bigger issue is that only a minority of p < 0.05 for PFS translates into evidence for a mortality benefit in a later, larger trial (or one with longer follow-up).

I wish it was that simple. This paper by Kristine Broglio and Don Berry gives some context from a statistical perspective and shows that in malignancies with long post-progression survival the overall survival endpoint may be misleading compared with progression-free survival. For clinical context on this finding see here.

My expertise is renal medullary carcinoma (RMC), a highly lethal (and unrecognized) kidney cancer that predominantly afflicts young individuals of African descent. If left untreated, RMC will kill patients within 3 months. When the first-line therapy for this disease was established a few years ago, the lengthening of progression-free survival naturally increased the overall survival to a median of 13 months. As we are developing second-line (and later line) regimens for this disease, the improvement in progression-free survival almost perfectly corresponds to an increase in overall survival. Aggressive cancers with few options like RMC tend to behave like this.

But even for the most common kidney cancer (clear cell renal cell carcinoma) we have summarized in Table 1 of this article the progression-free survival and overall survival results that have led to all the FDA approvals to date for kidney cancer (all FDA approvals for kidney cancer systemic therapies to date have been based on clear cell renal cell carcinoma). There is not a single instance where the progression-free survival showed benefit and the overall survival (with either longer follow-up or in a subsequent trial) did not.

Disease progression in oncology is an ordinal outcome (in solid tumors it is comprised of: complete response, partial response, stable disease, progressive disease, death) and indeed has parallels with the COVID-19 ordinal modeling. At least some of the arguments for and against COVID-19 ordinal outcomes are very similar to the arguments for and against using disease progression as an oncology outcome. In most oncology trials, progression is most commonly used as a time-to-event outcome (or erroneously dichotomized into “response” vs “no response”) but see here as an example our paper jointly modeling progression as an ordinal and time-to-event outcome in a phase I/II design setting.

Note that I am a harsh critic of the RECIST criteria most commonly used to define progression in solid malignancies, and the trial designs I am currently involved with use utility functions to take into account broader considerations like patient preferences. It does feel odd that I have to defend disease progression (and the RECIST criteria behind it) but my point is that the topic is substantially more complex than what is claimed in some corners of the Twitterverse.

5 Likes

Very true. Most oncology trials are highly confounded by what happens after progression typically starting a different therapy including another clinical trial. In addition, cancer patients vary widely in when they and their doctor decide to switch from therapeutic to palliative care.

2 Likes

I made some summary tables based on the efficacy data included in the summary document (https://www.accessdata.fda.gov/drugsatfda_docs/nda/2020/214787Orig1s000Sumr.pdf).

1 Like

For some reason, I cannot seem to replace the one table that needed updating in my previous post and/or post an updated table. Thus, please accept my apologies for the omission of the two National Library of Medicine (i.e., clinicaltrials.gov) trial identifiers in the clinical trials data table that I posted. The omitted identifiers are as follows:

*Trial GS-US-540-5773 is NCT04292899; and

*Trial GS-US-540-5774 is NCT04292730.

Some of the other trials/studies referenced in the summary document were not found in clinicaltrials.gov.

2 Likes

I think the following ties into the consideration of patient preferences that you’ve raised.

If feasible I’d like to see HRQoL compared with PFS in studies comparing cancer treatments (often having toxic effects) that are given continuously. In this mode of administration the toxicities (including financial) are experienced daily for extended periods of time, which can offset the benefit of stabilizing the disease ( based on imaging events).

I recognize that HRQoL comparisons would be secondary endpoints to assist in making judgment calls about the clinical significance in PFS gains. In short, it would help in at least some cases to answer the basic question: is the imaging gain worth the pain? I don’t see how one can estimate the answer without QoL comparisons over the course of treatment - again in treatments that are given until progression or unacceptable toxicity.

While Patient reported outcomes are inherently subjective, it seems feasible to me that the comparisons of changes in QoL in a randomized trial can give an objective comparison of how well patient group A live while on treatment compared to group B.

In a setting where there may be multiple reasonable treatment approaches, the HRQoL piece is needed to inform patient preferences.

1 Like

I completely agree that HRQoL is key and should be respected more in oncology clinical decisions. I think that the best way to incorporate HRQoL is by using utility functions that take into account efficacy outcomes (such as PFS or survival), adverse events, and HRQoL metrics. All these metrics have crucial subjective components that can only be resolved by taking patient preferences/utilities into account. I discuss in my lecture here why even when mainly focusing on a supposedly hard outcome such as overall survival, clinical decisions can be completely opposite depending on patient preferences.

Trialdesign.org has now started incorporating utility-based designs such as BOIN12 and UBOIN with more to come. Incorporating HRQoL will be a key next step.

1 Like

Longitudinal ordinal outcomes approximate this and are perhaps easier to develop. See the first link here.

1 Like

Yes, I am a big fan of ordinal outcomes in general, largely because I have been on a steady diet of @f2harrell literature for the past few years. I still think that utility-based designs can have certain distinct advantages for oncology purposes that may actually simplify things while facilitating more correct decisions. Stay tuned for more on the topic.

3 Likes

I wonder if you’ve seen this report and if you can share your take on the study methods and reporting of compared changes HRQoL?

Quality of Life Effect of the Anti-CCR4
Monoclonal Antibody Mogamulizumab Versus
Vorinostat in Patients With Cutaneous T-cell
Lymphoma

https://www.clinical-lymphoma-myeloma-leukemia.com/article/S2152-2650(20)30511-5/pdf

I applaud the effort, although looking at this study my inner @f2harrell voice would point out issues with the dichotomization of variables (e.g., age < 65 vs >65), variable selection based on p < 0.1 on univariate analysis (that is a big no-no), and using changes from baseline.

2 Likes