Randomized non-comparative trials: an oxymoron?

Pavlos_Msaouel · September 24, 2024, 6:57am

Randomized non-comparative trials (RNCTs) are becoming increasingly more popular, particularly in oncology, and being published in prominent clinical journals. The idea is to randomize between two or more treatment arms and then not compare them with each other but instead compare each arm with historical controls or prespecified values. Essentially, RNCTs act as single-arm trials for each treatment group. Convenience sampling is used, similarly to standard comparative randomized trials, so there is no additional control of the sampling process to facilitate estimation of group-specific uncertainty estimates (e.g., confidence intervals for survival outcomes in each treatment group).

It is also unclear (at least to me) what is the use of the random treatment assignment? In standard randomized trials the random treatment assignment facilitates estimation of the uncertainty estimates for comparative between-group estimands. However, by their very nature, RNCTs do not prespecify any comparisons and often avoid showing, e.g., hazard ratios and their 95% CI although authors and editors may be tempted to show them if they are “positive” (i.e., “statistically significant”).

We recently performed a scoping review of RNCTs and found that this design is indeed being increasingly used with half of RNCTs reporting comparative between-group results, accompanied by significance testing in almost one-third of trials. Although we tried to steelman this design, it is hard to see what advantage is gained by random treatment assignment if we are not going to compare the treatments? Also, if we are going to analyze them as single-arm studies, why not increase instead the sample size for the arm of interest and/or control the sampling procedure to make sure our sample is comparable, e.g., to the historical controls we will use?

The best argument I can think of for using RNCTs is that perhaps if there is equipoise it may make ethical sense to randomize the treatment assignment rather than exposing our patient to the therapy of their (or their physician’s) preference? This argument does not feel right. Would appreciate thoughts by the members of this forum on whether there is a place for RNCT designs in practice? Or are they indeed based on a fundamental conceptual error and should be strongly discouraged?

karlamoPA · September 24, 2024, 10:41am

Is there a place for RNCT designs in practice? As I understand it, it seems like signal searching?

For efficacy trials, I can’t see a justification. Possibly RNCT design could be useful for indication having no effective treatment? But here too a control can be used - and in my view should be used: best supportive care with quality of life as the primary endpoint for comparison.

f2harrell · September 24, 2024, 11:21am

I am so glad to see us alerted to the increasing use of this disastrous design. One symptom of bad thinking is that most examples do not even adjust for simple drift in age or disease severity through covariate adjustment when comparing to historical data. To me the only cogent way to deal with this is to explicitly model bias due to non-concurrency non-randomization as done here.

davidcnorrismd · September 24, 2024, 12:03pm

Is there some canonical set of methodological references on RNCT’s, Pavlos? Or is this more an accidental development? Who are the statisticians supporting these designs?

simongates · September 24, 2024, 3:08pm

Bit of shameless self-promotion: I wrote a bit about this issue here

simongates · September 24, 2024, 3:12pm

To answer the question: I think this type of “trial” is an absolutely terrible idea. It’s like people have forgotten what the whole point of randomisation is.

Pavlos_Msaouel · September 24, 2024, 8:00pm

Good questions, the closest we could find within the canonical methodology literature is this paper but it is not typically cited in the actual RNCT papers. Unless we are missing something, it appears to be mostly an accidental development that began quietly via publication of small RNCTs in specialized oncology journals but is now aggressively expanding over time both within oncology and beyond.

We realized we had to alert the methodology community of this new meme after an RNCT was published in the Lancet without even stating anywhere in the title or abstract that this is a non-comparative trial. The only way to know is to look deep in the methods section of the full text or notice in the abstract that the treatment and placebo groups are not actually compared with each other. This hit particularly hard because one can tell there was immense effort to conduct and finish this trial dedicated to a particularly rare cancer group (metastatic phaeochromocytomas and paragangliomas) which is an unmet need. It is possible that those involved in this study would have chosen a different design if they knew that the “randomization” in RNCTs does not confer the evidence-based medicine aura of standard RCTs.

I have never directly interacted with the statisticians that promote RNCTs but have heard from trainees who participate in oncology trial design workshops that RNCTs are recommended by biostatisticians in scenarios whereby there are not enough resources (e.g., funding or patient numbers) to meaningfully power a comparative RCT. In this situation, RNCTs are recommended as a way to have our (random) cake and eat it too.

Really glad others like @simongates have also noticed this problem. We tried first to publish our paper in clinical journals and got thoroughly rejected by their statistical editors mainly due to low interest in showcasing this issue. We then submitted to the Journal of Clinical Epidemiology where the peer reviewers and editor appeared surprised that RNCTs exist, which was a relief. There does appear to be a disconnect between statisticians immersed in methodology and at least some of the statisticians serving in the editorial boards of clinical journals. The two groups do not always overlap.

simongates · September 25, 2024, 8:33am

I find it incredible that this sort of rubbish is being published in “top” journals like Lancet and JCO, which make grand claims like “the single most credible, authoritative resource for disseminating significant clinical oncology research.”

I think that, in oncology at least, part of the root of this practice may be that people are very used to seeing single-arm trials. These are often used in oncology, sometimes for good reasons, but are often over-interpreted. There’s a very limited amount you can say about comparative efficacy from a single-arm trial (without getting into the complicated business of making appropriate comparisons with historical controls, which is rarely done), but that doesn’t stop people doing it. This really struck me when I started working in oncology a few years ago.

I had a look on the Lancet and JCO websites but I couldn’t see any information about their statistical editors - does anyone know if this is available anywhere? (I had a fun exchange with the statistical editors of NEJM a few years ago about their requirement for p-values in baseline tables of RCTs - which they have now changed, several years later).

f2harrell · September 25, 2024, 11:37am

There is a general theme that bothers me to no end. Statisticians often appear to prioritize helpfulness over principles.

davidcnorrismd · September 25, 2024, 5:19pm

MSKCC biostatistician Alexia Iasonos is listed as JCO’s Deputy Editor. See however this post; good luck!

Pavlos_Msaouel · September 25, 2024, 6:26pm

In fact, motivated also by your blogging of these NEJM exchanges we reviewed here the prevalence and implications of this “table 1 fallacy” in oncology RCTs.

simongates · September 26, 2024, 7:35am

On the Editorial Board tab on that page, there’s a whole list of biostatistics board members. I only recognise one or two names though.

edit to add the link again
https://ascopubs.org/jco/about/editorial-roster

davidcnorrismd · September 26, 2024, 11:27am

Among those names, I recognized Boris Freidlin. He often co-authors methodologic papers with Edward L. Korn. Their work strikes me as thoughtful and substantive.

JosieS · October 14, 2024, 5:42pm

Useful discussion and resources, thanks.

Sadly, this is quite common when trialists with warped priorities don’t get what they want out of comparisons with control. Core outcome sets (eg https://www.comet-initiative.org/) really need to be accompanied by core methods for analysis and reporting of each core outcome. Without them, trialists can appear to tick all the right boxes without respecting the underlying principles (see also: ITT; randomisation without allocation concealment; easily identifiable placebos, …).

The link below is to one of a number of systematic reviews of the same topic (SRs often outnumber RCTs for treatments struggling to find a disease) which commits the same sin. Only two RCTs but dozens of single cohort studies. With no evidence of any benefit from the RCTs, they analyse the intervention arms of the RCTs as if they were single cohorts, combine them with all the other single cohorts, and marvel at the tight confidence intervals around a high response rate (never mind that placebo achieves a similar response rate in the RCTs).

I’ve been learning from, and helping to debunk, this kind of nonsense for 30 years now and, without wishing to succumb to nihilism, I’ve concluded that this is just one of the many battles to be refought, over and over again, in the endless war that we will all need to continue to waste masses of Brandolini time fighting.

It’s depressing. But also a great source of teaching examples. More added to my files now. Thanks again.

JosieS · October 14, 2024, 6:01pm

Replying to myself because I remembered another, related, example - in this case where the trialists really were trying to do the right thing but failed because they didn’t understand why (or how) we randomise:

Multifaceted shared care intervention for late life depression in residential care: randomised controlled trial

The intervention was things like posters and training for clinicians and carers to reduce depression amongst the elderly residents of a care facility. The authors, correctly, realised that the intervention could not be delivered to individuals, meaning that they had to do a cluster trial.

But they only had one cluster. So they randomised all the residents to have their data collected either before or after the intervention was implemented…

Resulting in a before/after study with half the available sample size and no ability to do a paired analysis, reducing the effective sample size by potentially orders of magnitude due to analysing independent groups rather than using each person as their own control.

To add insult to injury, the before and after follow-up was consecutive six month periods, introducing confounding due to seasonality, for a condition which is strongly seasonal.

These authors were absolutely trying to do the right thing. They just didn’t have the expertise to recognise what the right thing was. They knew the label on the box they had to tick but not what that label means.

So, a lot of it is about education (and a better supply of professionally qualified statisticians).

But a lot of it is about intentional deceit, for career and/or shareholder benefit. Which is why this war is endless. And good teaching so important.

simongates · October 15, 2024, 1:33pm

Analysing and discussing response rates from single-arm studies is a common practice, but seems hugely problematic. The patients aren’t a random sample (so what response rate is being estimated?) and by patient selection you can get pretty much any response rate you want.

JosieS · October 15, 2024, 3:33pm

There’s often a big ‘response’ to placebo. For all sorts of reasons but one of them is because people with relapsing-remitting conditions tend to seek out new treatment options when their symptoms are at their worst. They’re going to feel better soon regardless but it creates a powerful illusion. I think this is why treatments like homeopathy and acupuncture have so many users who are genuinely convinced of their effectiveness. Similar to the illusory association created by the timing of the MMR vaccine and the age at which autism becomes diagnosable.

Pavlos_Msaouel · February 1, 2025, 7:12pm

As an update, our letter to the Lancet was just published noting the oxymoron of a recent RNCT published in that journal. The authors reply here and provide no defense of the RNCT design other than it was conducted in a rare tumor. They note that “The placebo-control arm serves as a real time internal control to evaluate the anti-tumour benefit over risk ratio outside of any p-value consideration.”

Which begs the question how is a control used if not comparatively? And why enroll a placebo arm if you are not going to use it in comparison with the treatment arm? The trial enrolled 39 patients in the treatment arm and 39 patients in the control arm. If the trialists did not want to compare the treatment arm to concurrent control then why not enroll all 78 patients to the treatment arm? That would have at least increased the accuracy of estimates in the treatment arm, which is particularly important to do in a rare tumor.

The counterargument is that by using the RNCT design the authors were able to call this the “first randomised trial” (including in the reply) which automatically increases its clout in the community…

Pretty sure all this was an honest mistake and a consequence of using statistics purely as a bureaucracy tool. If randomization is chiefly understood as a ritualistic function and not as a quantitative method to generate information, then inefficient aberrations such as RNCTs will emerge like a broken telephone game.

ESMD · February 1, 2025, 9:33pm

Arrgh! So frustrating. My own 2 cents- I think these errors stem from widespread misunderstanding of the phrase “RCTs show causality.”

https://discourse.datamethods.org/t/causal-inferences-from-rcts-could-toy-clinical-examples-promote-understanding/18375/15?u=esmd

Pavlos_Msaouel · February 1, 2025, 10:35pm

Note also that the authors cite in their reply this guidance for developing rare cancer trials which very specifically states:

What makes a credible trial in rare cancers? Comparative data: The lack of robust historical data greatly affects the starting point of trial design. Absence of an evidence-based standard-of-care commonly leads to heterogeneity in practice. Design assumptions often use small series or extrapolation from other settings. There may be little agreement on standard parameters such as prognostic factors and expected outcomes. Randomisation becomes a must-have in such situation, allowing causation and the establishment of some levels of evidence. Indeed, randomisation is usually required for inclusion in the IRCI network.

It really is a broken telephone game. They simply didn’t realize that randomization in experimental design is synonymous with comparisons regardless of whether a cancer is rare or common…