Decades of underpowered RCT

One of the most frustrating problems with medical science is the construction of the incomplete bridge. No other industry could build bridges which are incomplete and then simply say “one problem with our bridge is that we did not have enough steel to complete it”.

This is ubiquitous and always ends with a call for another trial leaving the clinician with no answer.

The important question which could have been answered by this RCT remains unanswered. Why can’t this be fixed?

https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2835888

1 Like

Yes, it’s a problem. The traditional way of thinking about trials is that they should be independent and not incorporate information that we already have.

Scott Berry talks about this issue (among others) here

also this datamethods thread

Scott makes the point that when a 1000 patient trial is suggestive but inconclusive (“fails to show significance”) the usual reaction is to do a 2000 patient trial, rather than add maybe a few hundred to the existing evidence. At best this is a waste of resources.

4 Likes

Totally agree, and also it is unclear exactly when it is justified to do a second trial. I’d rather see one really good sequential trial (with N as a random variable) than the mess we have now. A suggested Bayesian strategy that meets multiple clinical goals is here.

3 Likes

Hi,

I have a number of comments, on this general issue and on the study being referenced specifically.

First in a general context, underpowered studies are problematic, not only due to the failure, in a frequentist setting, to reject the null leaving the conclusion ostensibly indefinite, but when the null is rejected, there is a reasonable probability of overestimating the treatment effect, leading to overly optimistic expectations.

In the latter case, there was a 2013 article in Nature by Button et al that discusses this:

Power failure: why small sample size undermines the reliability of neuroscience

To quote from the opening paragraph in the above paper:

It has been claimed and demonstrated that many (and possibly most) of the conclusions drawn from biomedical research are probably false. A central cause for this important problem is that researchers must publish in order to succeed, and publishing is a highly competitive enterprise, with certain kinds of findings more likely to be published than others. Research that produces novel results, statistically significant results (that is, typically p < 0.05) and seemingly ‘clean’ results is more likely to be published. As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those practices reduce the likelihood that the findings reflect a true (that is, non-null) effect.

Of course, there is Doug Altman’s original 1994 BMJ editorial:

The scandal of poor medical research

and Richard Smith’s 2014 follow up:

Medical research—still a scandal

On the particular study being referenced here:

I am curious as to why they considered this as a phase II study, given the prior research that was available and that the study design was more along the lines of a phase III confirmatory design.

I do not really see anything in the study design that I would label as phase II, in that the target sample size is above the typical phase II two-arm designs, there is no dose-finding component to the design, and they used a two-sided hypothesis where a one-sided hypothesis would likely be more common, increasing the effective power and resulting in a smaller sample size for a phase II study.

The only parameter I see that is suggestive of a phase II design is the lower a priori power of 0.8.

Notwithstanding adaptive study designs, a single phase II study is intended to provide sufficient evidence to enable the decision by relevant parties to proceed or not to proceed with a larger phase III, confirmatory study. In essence, there is an a priori expectation of possibly needing more research given the findings of the phase II study.

I do not see that context here, in that their inference (and in the associated commentary) of the need for a larger, multi-site study, appears to be the result of their failure to meet their target enrollment, not because of a priori intent.

To their credit, they did perform a priori power/sample size estimates, albeit they had problems, as noted, of fulfilling their target enrollment, which can be a common challenge. They enrolled only 81 of the target 150 subjects, anticipating a 20% attrition rate versus the 26% actual.

Further, their observed treatment effect size was lower than hypothesized in the truncated cohort for even their smallest target sample of 84 subjects (total 106 subjects at 20% attrition), which was a 22% absolute reduction versus the 13.5% observed.

The associated commentary would seem to infer that even the observed effect is still clinically meaningful, albeit I would defer to clinical subject matters experts on that.

As we know, the premise of just needing more subjects (e.g. the notion of “trending towards significance”), by presuming that the observed results would be the same as in the available cohort, is highly problematic.

So this particular study was, to my eyes, problematic on a number of fronts and one wonders, perhaps in hindsight and presuming that they knew that they had enrollment problems well advance of stopping the study, what other options were considered (e.g. expanding the number of study sites) to try to meet the original enrollment goals.

That all being said, I agree with the premise of this post, that there are problems generally speaking, and as Button et al noted, many driven by the need to publish.

4 Likes

“..not incorporate information that we already have.”
What would your reference be for this? I don’t see this in clinical trials nor in any books on DOE.

Some of it, in my experience, is put forth as justified by the study team, IRB, and FDA due to feasibility of accrual to a very large trial; furthered by the hope the signal for meaningful benefit will be large.

This is quite an optimistic approach to study design, which could be improved by allowing for early stopping if the effect is really what study leaders dreamed of but would continue if all that’s happening is clinical significance.

1 Like

There’s a lot of that - what we might call feasibility studies! Clinical trials are fairly often terminated for lack of efficacy - which is the responsibility of the data monitoring committees. Another rationale for continuing can be if there are responses for some participants that are deemed clinically meaningful for indications with unmet needs. To be clear, I’m not advocating for any position here, just passing on what I’ve observed as an advocate (in response to an important issue raised)

Agreed. On a related note I estimate that of the large number of studies that went to completion and had p > 0.05 they could have stopped early for futility at less than 1/3 of the final sample size.

1 Like

An important question that could be directed to leadership of the CIRB (NCI centralized IRB)

A late-arriving thought (common for me nowadays): when the primary endpoint is time to progression or PFS, it can be many months even years after full accrual for some indications (such as the indolent lymphomas) to compare outcomes.

Probably better to evaluate this concern on a trial by trial basis.