Quality of remdesivir trials

For some reason, I cannot seem to replace the one table that needed updating in my previous post and/or post an updated table. Thus, please accept my apologies for the omission of the two National Library of Medicine (i.e., clinicaltrials.gov) trial identifiers in the clinical trials data table that I posted. The omitted identifiers are as follows:

*Trial GS-US-540-5773 is NCT04292899; and

*Trial GS-US-540-5774 is NCT04292730.

Some of the other trials/studies referenced in the summary document were not found in clinicaltrials.gov.


I think the following ties into the consideration of patient preferences that you’ve raised.

If feasible I’d like to see HRQoL compared with PFS in studies comparing cancer treatments (often having toxic effects) that are given continuously. In this mode of administration the toxicities (including financial) are experienced daily for extended periods of time, which can offset the benefit of stabilizing the disease ( based on imaging events).

I recognize that HRQoL comparisons would be secondary endpoints to assist in making judgment calls about the clinical significance in PFS gains. In short, it would help in at least some cases to answer the basic question: is the imaging gain worth the pain? I don’t see how one can estimate the answer without QoL comparisons over the course of treatment - again in treatments that are given until progression or unacceptable toxicity.

While Patient reported outcomes are inherently subjective, it seems feasible to me that the comparisons of changes in QoL in a randomized trial can give an objective comparison of how well patient group A live while on treatment compared to group B.

In a setting where there may be multiple reasonable treatment approaches, the HRQoL piece is needed to inform patient preferences.

1 Like

I completely agree that HRQoL is key and should be respected more in oncology clinical decisions. I think that the best way to incorporate HRQoL is by using utility functions that take into account efficacy outcomes (such as PFS or survival), adverse events, and HRQoL metrics. All these metrics have crucial subjective components that can only be resolved by taking patient preferences/utilities into account. I discuss in my lecture here why even when mainly focusing on a supposedly hard outcome such as overall survival, clinical decisions can be completely opposite depending on patient preferences.

Trialdesign.org has now started incorporating utility-based designs such as BOIN12 and UBOIN with more to come. Incorporating HRQoL will be a key next step.

1 Like

Longitudinal ordinal outcomes approximate this and are perhaps easier to develop. See the first link here.

1 Like

Yes, I am a big fan of ordinal outcomes in general, largely because I have been on a steady diet of @f2harrell literature for the past few years. I still think that utility-based designs can have certain distinct advantages for oncology purposes that may actually simplify things while facilitating more correct decisions. Stay tuned for more on the topic.


I wonder if you’ve seen this report and if you can share your take on the study methods and reporting of compared changes HRQoL?

Quality of Life Effect of the Anti-CCR4
Monoclonal Antibody Mogamulizumab Versus
Vorinostat in Patients With Cutaneous T-cell


I applaud the effort, although looking at this study my inner @f2harrell voice would point out issues with the dichotomization of variables (e.g., age < 65 vs >65), variable selection based on p < 0.1 on univariate analysis (that is a big no-no), and using changes from baseline.


Appreciated. Is it feasible to give a broad brush outline on how you’d design the study differently - address those issues?

No prob. To address these three issues:

  1. I would model the association between continuous variables such as age and the outcome using a smooth non-linear approach such as restricted cubic splines. Here is a very nice paper by @drjgauthier showing the value of this using examples from the hematologic malignancy world.

  2. Instead of p-values, I would use contextual knowledge to select the variables to be included in my model with the goal to reduce the heterogeneity of the outcomes and thus increase precision given that this is a randomized controlled trial. We discuss here the use of directed acyclic graphs as a way to facilitate the representation and critique of such contextual knowledge in oncology.

  3. Instead of changes from baseline, I would focus on the actual ordinal scores after treatment initiation and covariate adjust for the baseline scores. A simple but efficient approach in the context of an RCT would be to focus on a specific time-point of interest and perform semiparametric ANCOVA using a proportional odds models (orm function in the rms package) whereby the patient-reported outcome is treated as an an ordinal dependent variable with no binning. A smooth relationship can be assumed between baseline and landmark timepoint patient-reported outcomes using restricted cubic splines (as discussed above) and Wald statistics can be used to determine the relative importance of each treatment group with regard to the patient-reported outcome. We used such a simple approach for patient-reported outcomes in our TEMPA RCT here.

While there are ways to model such outcomes over time instead of at a landmark time point, as a practicing trialist I do like nowadays to focus on very specific time points of interest to minimize burden to patients which, among other things, can cause lots of missing data. Answering these patient-reported outcomes can be very cumbersome to patients and also a major strain on research and clinical personnel, particularly when there are so many other parts of the trial protocol that also need to be completed on the same day. Focusing on high-yield time points and patient-reported tools can mitigate this.


Sincere thanks. These are not easy concepts for a lay advocate to understand and advocate for. The articles were interesting and readable … The approaches feel right. Is there a budding or flowering consensus on comparing QoL effects in the field ?

cubic splines,
acyclic graphs,
ordinal scores after treatment initiation with covariate adjustments for baseline scores.


I think there are various efforts to produce some guidance on the topic. The most recent one I saw was this one, which is a pretty good review and highlights for example the need to adjust for baseline covariates. The one part I really disagree with is that in cases where a PRO is evaluated at a specific time point they recommend linear regression over ANCOVA. As mentioned above, in parallel group randomized clinical trials, ANCOVA on the PRO score adjusted for baseline score is a highly reliable way to answer the critical question whether two patients who started with the same PRO score will end with the same PRO score.


How can a paper that purports to provide statistical guidance on analysis of patient reported outcomes nowhere mention the proportional odds model?


Yup. I’ve had this discussion many times and part of the preference for linear (or even logistic) models by all stakeholders (including statisticians) is that they are less familiar/comfortable with ordinal models, including the assumptions behind them.

This takes us back to the original point of this discussion, i.e., that clinical researchers are hesitant to use ordinal outcomes in COVID-19 trials. I am actually actively using the example of COVID-19 trials focusing on ordinal outcomes as justification to start applying such models more in oncology.

1 Like

Those are good points. But regarding the paper, they are discussing analysis of outcomes that are already ordinal.


I agree but the counterargument from their side is that these PRO scores are sums of ordinal scales that can be treated as interval. An underlying assumption here is that the ordinal vs interval debate in this situation is trivial. I think that treating PRO scores as interval causes more trouble than it is worth but those who advocate for linear models would beg to differ.

It would be nice to see some of the distributions, to check for absence of floor effects, ceiling effects, bimodality, heavy tails, etc. On the other hand why take a risk by still using parametric methods when the semiparametric methods are so powerful?


in the jama paper you tweeted (hydroxychloroquine RCT) it says they used simulations for power. Have you used simulations to see how floor effects, bimodality, non-PO etc affect power? And why simulations, ie is it not possible to just use these equations?: Size and power estimation for the Wilcoxon–Mann–Whitney test for ordered categorical data

I think a better place to seek that is one of the papers that studied what goes wrong when you use linear models to analyze ordinal data.

I agree but the counterargument from their side is that these PRO scores are sums of ordinal scales that can be treated as interval.

This is a common assumption in psychometrics, but I’ve never been able to find any rigorous mathematical argument in favor of it.

On the contrary, much like the literature on “significance” testing vs. p values as surprise measures, there is a debate within social science on whether it is permissible to treat ordinal data as interval.

Representational measurement theory would advise analysis that takes into account only the order properties when spacing cannot be verified as equal at all points of the scale.

Joel Mitchell (a mathematical psychologist) as written numerous papers on this issue and considers psychometrics a “pathological science” because of it.

Michell J. Normal Science, Pathological Science and Psychometrics. Theory & Psychology. 2000;10(5):639-667. doi:10.1177/0959354300105004

The consequences of treating these ordinal scales as interval has been explored in this paper:

After reading this, I’ve come to the unfortunate conclusion that parametric analysis of PRO measures at the individual study level makes any synthesis or meta-analysis (without access to individual data) unreliable.

If researchers would simply use ordinal methods at the individual study level, then meta-analysis could be a useful tool. The methods recommended by @f2harrell seem correct if the goal is to learn at both the individual level, and to aggregate studies via meta-analysis.


It sounds as if Medicine’s mantra of “continuous learning” needs to be brought somehow to Statistics.