Statisticians, clinical trialists, and drug regulators frequently claim that they want to control the probability of a type I error, and they go on to say that this equates to a probability of a false positive result. This thinking is oversimplified, and I wonder if type I error is an error in the usual sense of the word. For example, a researcher may go through the following thought process.
I want to limit the number of misleading findings over the long run of repeated experiments like mine. I set \alpha=0.05 so that only \frac{1}{20}^\text{th} of the time will the result be âpositiveâ when the truth is really ânegativeâ.
Note the subtlety in the word result. A result may be something like an estimate of a mean difference, or a statistic or p-value for testing for a zero mean difference. \alpha deals with such results. This alludes to the fraction of repeat experiments in which an assertion of a positive effect is made. But what most researchers really want is given by the following:
I want to limit the chance that the treatment doesnât really work when I assert that it works (has a positive effect).
This alludes to a judgement or decision error â the treatment is truly ineffective when you assert that it is effective.
When I think of error I think of an error in judgment at the point in which the truth is revealed, e.g., one decides to get a painful biopsy and the pathology result is âbenign lesionâ. So to me a false positive means that an assertion of true positivity was made but that the revealed truth is not positive. The probability of making such an error is the probability that the assertion is false, i.e., the probability that the true condition is negative, e.g., a treatment effect is zero.
Suppose one is interesting in comparing treatment B with treatment A, and the unknown difference between true treatment means is \Delta. Let H_0 be the null hypothesis that \Delta=0. Then the type I error \alpha of a frequentist test of H_0 using a nominal p-value cutoff of 0.05 is
P(test statistic > critical value | \Delta=0) = P(assert \Delta \neq 0 | \Delta=0) = P(p < 0.05)
The conditioning event follows the vertical bar | and can be read e.g. "if \Delta=0". \alpha is 0.05 if the p-values are accurate and the statistical test is a single pre-specified test (e.g., sequential testing was not done).
On the other hand, the false positive probability is
P(\Delta = 0 | assert \Delta \neq 0) = P(\Delta = 0 | p < 0.05)
This false positive probability is arbitrarily different from type I error, and can only be obtained from a Bayesian argument.
My conclusion is that even though many researchers claim to desire a type I error, what they are getting from the type I error probability \alpha is not what they really wanted in the first place. Thus controlling type I error was never the most relevant goal. Type I error is a false assertion probability and not the probability that the treatment doesnât work.
As as side note, treatments can cause harm and not just fail to benefit patients. So the framing of H_0 does not formally allow for the possibility of harm. A Bayesian posterior probability, on the other hand, would be P(\Delta > 0) = 1 - P(\Delta \leq 0) = P(treatment has no effect or actually harms patients). This seems to be a more relevant probability than P(H_0 is true).