Quality of remdesivir trials

I wonder if you’ve seen this report and if you can share your take on the study methods and reporting of compared changes HRQoL?

Quality of Life Effect of the Anti-CCR4
Monoclonal Antibody Mogamulizumab Versus
Vorinostat in Patients With Cutaneous T-cell


I applaud the effort, although looking at this study my inner @f2harrell voice would point out issues with the dichotomization of variables (e.g., age < 65 vs >65), variable selection based on p < 0.1 on univariate analysis (that is a big no-no), and using changes from baseline.


Appreciated. Is it feasible to give a broad brush outline on how you’d design the study differently - address those issues?

No prob. To address these three issues:

  1. I would model the association between continuous variables such as age and the outcome using a smooth non-linear approach such as restricted cubic splines. Here is a very nice paper by @drjgauthier showing the value of this using examples from the hematologic malignancy world.

  2. Instead of p-values, I would use contextual knowledge to select the variables to be included in my model with the goal to reduce the heterogeneity of the outcomes and thus increase precision given that this is a randomized controlled trial. We discuss here the use of directed acyclic graphs as a way to facilitate the representation and critique of such contextual knowledge in oncology.

  3. Instead of changes from baseline, I would focus on the actual ordinal scores after treatment initiation and covariate adjust for the baseline scores. A simple but efficient approach in the context of an RCT would be to focus on a specific time-point of interest and perform semiparametric ANCOVA using a proportional odds models (orm function in the rms package) whereby the patient-reported outcome is treated as an an ordinal dependent variable with no binning. A smooth relationship can be assumed between baseline and landmark timepoint patient-reported outcomes using restricted cubic splines (as discussed above) and Wald statistics can be used to determine the relative importance of each treatment group with regard to the patient-reported outcome. We used such a simple approach for patient-reported outcomes in our TEMPA RCT here.

While there are ways to model such outcomes over time instead of at a landmark time point, as a practicing trialist I do like nowadays to focus on very specific time points of interest to minimize burden to patients which, among other things, can cause lots of missing data. Answering these patient-reported outcomes can be very cumbersome to patients and also a major strain on research and clinical personnel, particularly when there are so many other parts of the trial protocol that also need to be completed on the same day. Focusing on high-yield time points and patient-reported tools can mitigate this.


Sincere thanks. These are not easy concepts for a lay advocate to understand and advocate for. The articles were interesting and readable … The approaches feel right. Is there a budding or flowering consensus on comparing QoL effects in the field ?

cubic splines,
acyclic graphs,
ordinal scores after treatment initiation with covariate adjustments for baseline scores.


I think there are various efforts to produce some guidance on the topic. The most recent one I saw was this one, which is a pretty good review and highlights for example the need to adjust for baseline covariates. The one part I really disagree with is that in cases where a PRO is evaluated at a specific time point they recommend linear regression over ANCOVA. As mentioned above, in parallel group randomized clinical trials, ANCOVA on the PRO score adjusted for baseline score is a highly reliable way to answer the critical question whether two patients who started with the same PRO score will end with the same PRO score.


How can a paper that purports to provide statistical guidance on analysis of patient reported outcomes nowhere mention the proportional odds model?


Yup. I’ve had this discussion many times and part of the preference for linear (or even logistic) models by all stakeholders (including statisticians) is that they are less familiar/comfortable with ordinal models, including the assumptions behind them.

This takes us back to the original point of this discussion, i.e., that clinical researchers are hesitant to use ordinal outcomes in COVID-19 trials. I am actually actively using the example of COVID-19 trials focusing on ordinal outcomes as justification to start applying such models more in oncology.

1 Like

Those are good points. But regarding the paper, they are discussing analysis of outcomes that are already ordinal.


I agree but the counterargument from their side is that these PRO scores are sums of ordinal scales that can be treated as interval. An underlying assumption here is that the ordinal vs interval debate in this situation is trivial. I think that treating PRO scores as interval causes more trouble than it is worth but those who advocate for linear models would beg to differ.

It would be nice to see some of the distributions, to check for absence of floor effects, ceiling effects, bimodality, heavy tails, etc. On the other hand why take a risk by still using parametric methods when the semiparametric methods are so powerful?


in the jama paper you tweeted (hydroxychloroquine RCT) it says they used simulations for power. Have you used simulations to see how floor effects, bimodality, non-PO etc affect power? And why simulations, ie is it not possible to just use these equations?: Size and power estimation for the Wilcoxon–Mann–Whitney test for ordered categorical data

I think a better place to seek that is one of the papers that studied what goes wrong when you use linear models to analyze ordinal data.

I agree but the counterargument from their side is that these PRO scores are sums of ordinal scales that can be treated as interval.

This is a common assumption in psychometrics, but I’ve never been able to find any rigorous mathematical argument in favor of it.

On the contrary, much like the literature on “significance” testing vs. p values as surprise measures, there is a debate within social science on whether it is permissible to treat ordinal data as interval.

Representational measurement theory would advise analysis that takes into account only the order properties when spacing cannot be verified as equal at all points of the scale.

Joel Mitchell (a mathematical psychologist) as written numerous papers on this issue and considers psychometrics a “pathological science” because of it.

Michell J. Normal Science, Pathological Science and Psychometrics. Theory & Psychology. 2000;10(5):639-667. doi:10.1177/0959354300105004

The consequences of treating these ordinal scales as interval has been explored in this paper:

After reading this, I’ve come to the unfortunate conclusion that parametric analysis of PRO measures at the individual study level makes any synthesis or meta-analysis (without access to individual data) unreliable.

If researchers would simply use ordinal methods at the individual study level, then meta-analysis could be a useful tool. The methods recommended by @f2harrell seem correct if the goal is to learn at both the individual level, and to aggregate studies via meta-analysis.


It sounds as if Medicine’s mantra of “continuous learning” needs to be brought somehow to Statistics.




The discussion is quite interesting.


they dont know what ‘multivariate’ means. Also, wondering why they would do meta-analysis, i hope that doesnt become some new standard

Individual patient meta-analysis may have a role but I’d rather label it is “efficient analysis of studies with available individual level data, accounting for study heterogeneity”.

1 Like

These indeed are very interesting outputs. However, I haven’t found any recent RCT reporting such statements. Would you please have any reference?