I’d like to better understand the original motive for such a stupid design. Perhaps it had to be the perception that it was lowering the bar, to make finding treatment benefit easier.
Peter Thall suspects it may be due to power calculations. The practice of estimating study power is firmly ingrained in clinical trial design and is closely associated with randomization. However, power calculations frequently yield a sample size that is impractical to achieve. RNCTs specifically evade the need for comparative power calculations while allowing the totemic use of randomization with small sample sizes (e.g., N = 90 patients in the above breast cancer RNCT).
Of course, there are plenty of available designs that can be used for comparative RCTs with small sample sizes. Here is an example of a two-group Bayesian phase 2 comparative RCT design that gave us good properties with N = 90 total sample size despite us breaking down each treatment arm into three different prognostic biomarkers.
Thanks that makes sense. So we have
bad news → coverup → find a dishonest unethical design hiding the bad news
Yup, although I really do not think that the clinicians (and even many of the statisticians) involved in RNCTs realize the problem to be dishonest and unethical about it. Anticipate that many of them would immediately change practice once they see the issue with this design.
We provide here indirect evidence that the fundamental problem behind RNCTs is not commonly understood. This is why more than half of phase 3 RCTs in oncology present treatment group-specific inferences.
Why are only randomized designs considered when balanced designs can maximize information when you have small samples – which is entire rationale for these RNCT’s to begin with? Without getting into a deep methodological discussion, why aren’t more researchers aware of how to maximize information from small samples via procedures like minimization?
I posted a few references in this thread:
A good place to start would be this Douglas Altman article:
The CONSORT statement from 2010 (when Douglas Altman was one of the authors) explicitly mentions minimization in a number of places, where it is considered “equivalent” to an RCT.
Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG, for the CONSORT Group. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trial. BMJ. 2010;340:c869. PMID: 20332511 (PDF)
In Box 2, page 9 of the 2010 CONSORT Explanation and Elaboration Statement, the following is mentioned:
Nevertheless, in general, trials that use minimisation are considered methodologically equivalent to randomised trials, even when a random element is not incorporated.
A quick glance at the 2025 statement seems to indicate this advice has been removed for some inexplicable reason. There is also a discussion of the role of randomization that should have made these designs impossible to publish in any scientific journal..
You are preaching to the choir. There are literally patients alive today because of my Jaynesian belief starting a decade ago that we can, under certain conditions and if we work hard enough, do better than randomization in small samples. Notice that nothing in this thread or in our published commentaries on RNCTs excludes such options. ![]()
While it is debatable when and how much balance to prioritize over randomization (and there are strong arguments against minimization), RNCTs regardless have very little in their favor when randomization is used in a trial design.
I appreciate your commitment to apply rigorous scientific thinking into practice. My major concern with this discussion is that it never tells researchers who have a budget permitting them to collect only a small sample, on how to maximize their resources.
Why can we find statisticians advising these medical researchers on RNCTs, when the way to accomplish this goal (maximizing information from very small samples where randomization isn’t possible) has been known for over 50 years, and improved methods are on the shelves of libraries (or on the hard drives of websites)?
Is there a good, quick reference for this idea of Jaynes’s?
See the related Jaynes quote we mention here taken from here: “Whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought”
Patients talk to each other and this particular article is a direct response to requests from patients with the chromophobe subtype of kidney cancer to replicate what was done with renal medullary carcinoma (an even more rare and aggressive subtype). Indeed, efforts are now in motion to develop biology-driven therapies and trials for chromophobe kidney cancer.
Been itching to include this quote in an article for far too long ![]()
The Journal of Clinical Oncology (JCO) just published our letter to the editor discussing the inferential limitations of an RNCT recently published there. Importantly, the JCO RNCT authors provided a reply that cites an actual methodology review article promoting the use of RNCTs. The review comes from a statistical consulting group based in Europe, where many RNCTs indeed originate from.
Here is an excerpt of their methodological rationale:
we often propose an intermediate solution between a single-arm and a comparative design: a non-comparative, randomized design. Once again, patients will be randomized between a control treatment with a historical ORR of 40% or the new treatment, whose ORR of interest is considered to be 60%. In this case, however, there is no plan to formally compare the two arms, and control is used simply for ‘calibration’ of results. In this case, the sample size for the trial is not determined by the expected difference between arms, but simply on the expected results for the experimental arm, as in the single-arm design, with the advantage that the control arm will be used to assess the reliability of the historical data (i.e. of the null hypothesis) used in the design.
I am trying internally to steelman the above quote but can’t make sense of it. It claims that RNCTs use the control group to somehow “calibrate” the sampling procedure and make comparisons of the treatment group with historical controls more valid. But I have yet to see an RNCT that formally uses any such type of “statistical calibration”. The protocol of the JCO RNCT we commented on was actually independently published in 2018 and it has two experimental arms (each compared with historical controls) and one control group randomized in a 2:2:1 ratio but there is nothing there to explain what the purpose of the control group or of randomization is? Does anyone here see something I am missing?
My cynicism leads me to believe this is an attempt to give this type of study more credence than they should simply because it will have “randomized” in the abstract. Have any regulators offered opinions on this type of design? It seems no more credible than a traditional historically controlled trial in the best case.
Good question. Curious as well if any input from regulators to date. The methodology review advocating for RNCTs is pretty revealing because the authors and their group are well published in oncology statistics (see, e.g., here, here, and here) and they are neither crazy nor amateurs. For example, this article of theirs was relatively ahead of its time. And I thought when I read it that this commentary they coincidentally wrote in response to one of our articles was pretty insightful.
This may explain at least in part how RNCTs became so popular in oncology. Respected statistical groups have actually been actively promoting them. I still find RNCTs completely ludicrous and have a hard time discerning any advantages in their favor.
The authors’ “calibration” remark is an example of ill-specified methods and fuzzy thinking. Calibration in this context means anything you want it to mean.
Yup, here is another quote from the review article:
In a non-comparative, randomized design, the reliability of the historical ORR can be ascertained by contrasting it with the observed ORR in the control arm, and taking these two values into account when looking at promising, disappointing or outstanding results.
There is no formal statistical procedure. You just look into values and do…something with them. It doesn’t even specify what exactly we should look for. Point estimates? Uncertainty intervals? Something else? Neither is there an explanation of how exactly this is more efficient than for example:
- Running truly comparative RCTs from the outset and then update/combine the data. Prespecify initially lenient criteria for futility and if those are passed then keep enrolling or publish the trial and then lobby to obtain support for future larger RCTs.
- Run a larger single-arm study. Why would we randomly split our sample into treatment and control groups (the latter sometimes a placebo) if we are not going to compare them? It just makes no sense. We could use formal statistical methodology to benchmark the trial, match the patients or otherwise adjust to historical controls with all the caveats this entails of course. Anything would be better than a vague instruction to look at historical vs randomized controls.
You’ve documented stupidity on the part of advocates of this insane approach. I’d like to see what medical ethicists have to say about the design, and what the informed consent documents say. I suspect that the advocates’ language is not consistent with intents stated in informed consent docs. I also suspect the design is unethical. See this for what some ethicist colleagues said about another situation where patient data were being wasted.
I suggest vibe randomization as a term for this stuff. Instead of AI slop, it yields statistical slop.
This is all fascinating. The message that I take from it is that we can all get it very wrong sometimes. So probably a good idea to be open to criticism and prepared to learn.
Indeed. One of the tenets of ethical study design involving human participants is that it can advance knowledge - the risk to the subjects is offset by knowledge gained.
