Stability of results from small RCTs

Blockquote
Randomization does not guarantee exchangeability. The very definition of random implies imbalances in prognostic factors could happen, even if they are more and more rare as the sample size increases.

This very important point is carefully elaborated in the following:

So that leads me to the following question:

Are small randomized trials worth conducting?

In a previous post @f2harrell wrote:

Blockquote
Most statisticians I’ve spoken with who frequently collaborate with investigators in doing frequentist power calculations believe that the process is nothing more than a game or is voodoo. Minimal clinically important effect sizes are typically re-thought until budget constraints are met,…

For the purpose of this discussion, I’ll define a “small” experiment is where N < 200 per arm. I’ve seen this magic limit in a few simulation studies comparing randomization to minimization, so I will use it as a starting point to elicit how much value (if any) randomization is bringing to the possible experiment.

One of the big problems of “small” RCTs is the issue of random confounding. It appears (from a practical POV) that 200 per treatment arm is the minimum needed to silence everyone but the most vocal of skeptics who point out:

Blockquote
Critics of RCTs will argue that because there’s also always the possibility of there being an imbalance of known or unknown covariates between groups, RCTs cannot make proper causal inferences, especially small RCTs that are “unable to distribute confounders effectively.”

Simulations suggest that small RCTs are much more prone to this problem.

There is a credibility issue with small RCTs where the AHRQ suggests that there exist times when: “…it may be appropriate to focus on the one or the few “best” trials rather than combining them with the rest of the evidence.”

AHRQ (2018) Quantitative Synthesis: An Update – Sections 1.3 - 1.5.

From a pure decision theory point of view, randomization is not needed for correct inference. To be more precise: For any randomized experimental design, there exists a nonrandom design that will also provide the correct answer (more efficiently).

The proof sketch involves maximizing a particular criterion. In this case, the “optimal” experiment is one that gives the most information, with the smallest sample size.

If you think about how statistics are applied in industry (and health care quality), the key point of any “quality” initiative (ie. Six Sigma, for example) is to minimize variance. In finance, the key to maximizing the compounded growth rate is also to minimize variance. Indeed, in finance risk is defined as the variance in returns. Maximizing information is synonymous for minimizing risk (variance) in this context.

So why do we maximize variance in the case of experiments by insisting on randomization?

How relevant is this to practice? I’m not sure. The claim is one of existence and does not provide any guidance on how to derive the design of such an experiment in a particular case.

I accept that from a normative POV – causal inference can be made without randomization. I also accept that certain contexts (ie research with human subjects) randomized designs are often easier to derive and conduct. In the case of N-of-1 designs, randomization seems to be the only way to proceed. But I don’t think non-randomized designs are impossible in human subjects research; for given certain budgetary limitations, they might be the best way to proceed. Even the CONSORT guidelines recognize minimization as a valid a alternative to randomization.

CONSORT (2010) Explanation and Elaboration

Blockquote
Nevertheless, in general, trials that use minimization are considered methodologically equivalent to randomized trials, even when a random element is not incorporated. [See Box 2].

The late Douglas Altman has also written on minimized designs:

Blockquote
But stratified randomization using several variables is not effective in small trials. The only widely acceptable alternative approach is minimisation,2,3 a method of ensuring excellent balance between groups for several prognostic factors, even in small samples.

Altman, D. (2005) Treatment Allocation by Minimization

There is a trade-off that needs to be made between the value of the information, and the cost of the experiment. This appears to be difficult to do in the Frequentist philosophy; its influence has all but silenced discussion of nonrandom, but valid, research designs.

For research areas where small samples are the only ones economically obtainable, a Bayesian approach to experimentation is critical to maximizing the information obtained, for a defensible decision.

There exist algorithms that explicitly minimize the difference between treatment and control groups. I raise these points, because my field (allied health) is not ever going to have sample sizes of even a small drug trial; and if any research we produce is going to be dismissed simply because it is “small sample” (as the AHRQ link suggests) that presents a big problem for a number of professions.

Too much informal, frequentist reasoning has influenced what is perceived to be “rigorous.” The insistence on randomization for rigor has gone from a (very) useful heuristic, to a mistaken normative criterion.

I suspect that last statement is controversial, but I’d very much appreciate scholarly discussion and debate about it.

References (far from exhaustive):
Taves, D. (2010) The use of minimization in clinical trials
Treasure, T; MacRae, K (1998) Minimization: The Platinum Standard for Trials?
Jachin, J; Matts, J, Wei, LJ (1988) Randomization in clinical trials: Conclusions and recommendations

For a contrary view:
Senn (2008) Why I Hate Minimization

2 Likes