Hi,
I have a number of comments, on this general issue and on the study being referenced specifically.
First in a general context, underpowered studies are problematic, not only due to the failure, in a frequentist setting, to reject the null leaving the conclusion ostensibly indefinite, but when the null is rejected, there is a reasonable probability of overestimating the treatment effect, leading to overly optimistic expectations.
In the latter case, there was a 2013 article in Nature by Button et al that discusses this:
Power failure: why small sample size undermines the reliability of neuroscience
To quote from the opening paragraph in the above paper:
It has been claimed and demonstrated that many (and possibly most) of the conclusions drawn from biomedical research are probably false. A central cause for this important problem is that researchers must publish in order to succeed, and publishing is a highly competitive enterprise, with certain kinds of findings more likely to be published than others. Research that produces novel results, statistically significant results (that is, typically p < 0.05) and seemingly ‘clean’ results is more likely to be published. As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those practices reduce the likelihood that the findings reflect a true (that is, non-null) effect.
Of course, there is Doug Altman’s original 1994 BMJ editorial:
The scandal of poor medical research
and Richard Smith’s 2014 follow up:
Medical research—still a scandal
On the particular study being referenced here:
I am curious as to why they considered this as a phase II study, given the prior research that was available and that the study design was more along the lines of a phase III confirmatory design.
I do not really see anything in the study design that I would label as phase II, in that the target sample size is above the typical phase II two-arm designs, there is no dose-finding component to the design, and they used a two-sided hypothesis where a one-sided hypothesis would likely be more common, increasing the effective power and resulting in a smaller sample size for a phase II study.
The only parameter I see that is suggestive of a phase II design is the lower a priori power of 0.8.
Notwithstanding adaptive study designs, a single phase II study is intended to provide sufficient evidence to enable the decision by relevant parties to proceed or not to proceed with a larger phase III, confirmatory study. In essence, there is an a priori expectation of possibly needing more research given the findings of the phase II study.
I do not see that context here, in that their inference (and in the associated commentary) of the need for a larger, multi-site study, appears to be the result of their failure to meet their target enrollment, not because of a priori intent.
To their credit, they did perform a priori power/sample size estimates, albeit they had problems, as noted, of fulfilling their target enrollment, which can be a common challenge. They enrolled only 81 of the target 150 subjects, anticipating a 20% attrition rate versus the 26% actual.
Further, their observed treatment effect size was lower than hypothesized in the truncated cohort for even their smallest target sample of 84 subjects (total 106 subjects at 20% attrition), which was a 22% absolute reduction versus the 13.5% observed.
The associated commentary would seem to infer that even the observed effect is still clinically meaningful, albeit I would defer to clinical subject matters experts on that.
As we know, the premise of just needing more subjects (e.g. the notion of “trending towards significance”), by presuming that the observed results would be the same as in the available cohort, is highly problematic.
So this particular study was, to my eyes, problematic on a number of fronts and one wonders, perhaps in hindsight and presuming that they knew that they had enrollment problems well advance of stopping the study, what other options were considered (e.g. expanding the number of study sites) to try to meet the original enrollment goals.
That all being said, I agree with the premise of this post, that there are problems generally speaking, and as Button et al noted, many driven by the need to publish.