Doing a phase 2 RCT using one-sided significance threshold of less than 0.1 is acceptable? I always thought that a one-sided significance threshold of p < 0.10 (10% type I error rate), a meaningful departure from the conventional two-sided α = 0.05. One-sided testing assumes a priori that the experimental arm cannot be inferior to the control, and under the null hypothesis, one in ten trials would yield a positive result by chance alone. Curious to hear from the experts.
Using a generous significance threshold is defensible if the trial’s aim is just to establish if the new treatment could be beneficial (the classical Phase 2 aim). But if that is really the aim then you probably don’t want to recruit to 172 patients (they don’t say how long that took but suspect it was quite a long time). Given the large difference that was found, they could have stopped the trial much earlier with suffcient evidence that the intervention was promising.
So I’m not sure whether the aims changed a bit or were always more than just “phase 2”. It’s good that they randomised though, and that the result was so positive. I do worry a bit about labelling a trial as “phase 2” and thereby justifying a generous threshold for regarding it as “positive,” but then actually running a more defintive trial.
Best solution is surely to get away from p-value thresholds completely…
agree with simon
I would like to add that in an indication with severe mortality like AML the cost of a Type II Error is huge.
the worst-case scenario isn’t advancing a mediocre drug to Phase 3 (the Phase 3 trial will eventually catch that). The worst-case scenario is a Type II error (false negative)—abandoning a potentially life-saving drug because our statistical hurdles were too strict for a small sample size.
If we demand a strict alpha = 0.025 (one-sided) in a Phase 2 trial, we either:
Tank our statistical power, making it very likely that we miss active drugs.
Inflate our sample size so much that the Phase 2 trial costs as much and takes as long as a Phase 3 trial.
Relaxing the threshold to 10% can be looked as an intentional trade-off. the sponsors are usually happy to accept a 1 in 10 chance of a false positive (in phase 2) to ensure we maintain 80-90% power to detect a true signal.
As an aside - I am often curious about how many phase 2 trials actually end up, after further trials, with a clinically useful intervention. It probably depends on the field. This affects how we could interpret the p-values. For example, if 50% of phase 2 trials are of a new drug that is really better than the current drug (we just don’t know which 50%), then for alpha 0.1 and beta 0.8 (say) out of 100 trials, 5 would be FP, 45 TN, 40 TP, 10 FN - so a positive result would have a 40/45 probability of being TP and a negative a 45/55 probability of being a TN (though maybe we could make better estimates if we use the actual p-value post-hoc). I guess we don’t have the data needed to interpret this way (also I guess 50% is very optimistic - in one field I know the current rate is 0%!).
Yes I must have been thinking about a similar but wrong phrasing. But the full correct statement is one in ten trials would yield a positive result if the treatment is ignorable, just to avoid using the slightly more nebulous “chance alone”.
I was thinking that changing the ‘w’ to a ‘c’ nearly accomplishes this — effectively eliding the proviso here. Thus, we would read: one in ten trials could [if the treatment were ignorable] yield a positive result by chance alone, and the brackets would fill themselves in automagically.