What is type I error?



Good questions Drew. Come to think of it I don’t like multiplicity corrections in general, and anything that derives from p-values. And is false discovery rate even the correct terminology? I’m forgetting my statistical history now—I can’t remember if it attempts to estimate the proportion of true non-null effects or the proportion of non-null assertions. At any rate, it’s not really a rate but is rather a proportion or probability.

Regulator’s regret is an interesting term that regulators for too long have assumed means type I error. But in fact, and apropos of the original posting above, it is really the condition of approving a drug that doesn’t work (there is also the opposite regret of missing a good drug). The probability of regulator’s regret is the probability the treatment has no effect or harmful effect, so it’s not type I error.

For large-scale problems my biggest concern with FDR is that it doesn’t actually work. It lulls researchers into a false sense of security, makes them miss real effects, and fails to recognize that the feature selection method being used has no chance of finding the “right” features.


This paper could be helpful:

One of the very few that does not mix up Fisher’s significance tests (where there is no type-I error defined!) and Neyman’s acceptance tests (where you neccesarily have two different types of errors [I _and_ II]).

Perezgonzalez made a nice statement in a note there (bold emphasis mine):

“As H0 is always true (i.e., it shows the theoretical random distribution of frequencies under certain parameters), it cannot, at the same time, be false nor falsifiable a posteriori. Basically, if at any point you say that H0 is false, then you are also invalidating the whole test and its results. Furthermore, because H0 is always true, it cannot be proved, either.”

I always argue that H0 is always false, because in reality any infinitely small difference to H0 means that H0 is false. But this is actually not the point here. The test assumes H0. This is not, at no time, referring to something real. The p-value remains a statistic of the data assuming H0. We thus can not wrongly reject H0. We may or may not reject H0. There is no “correct-or-false” property associated with this. From the perspective of the model (which assumes H0), a rejection is “false” by definition, and from the prespective of reality, a rejection is “correct” by definition (except, maybe, for some carefully constructed or theoretical cases).


Frank, thanks for starting this great discussion. Being a physician, would it be fair to say that your point is similar to the following diagnostic problem:

A diagnostic test is often performed on a person suspected of the disease, rather than a random person. So imagine a neurologists who typically orders an MRI to really confirm a serious autoimmune brain disease that is already by appropriate signs, symptoms and a positive CT-scan. Previous research has demonstrated that of twenty negative patients, one patient will have a false positive result. Despite this 5% probability, no experienced neurologist would expect 1 in 20 patients for whom she orders an MRI to have a false positive result (although admittedly, the neurologist might not be able to articulate it that explicitly). Her patients are way too disease-suspect for that. A positive MRI hardly has any relevance to the situation. It is the occasional negative MRI that she’s aiming for in this situation. Again, it might not be easy for an experienced clinician to articulate it this way, but they still “know” it nonetheless in my experience.

Is my comparison (more or less) correct?


I’m not sure but I think so. But I find discussions of “1 in 20” and false positives to always be confusing. I stick to actionable posterior probabilities that are predictive in nature, as required by decision makers. So my suggestion is to acquire a well-calibrated risk prediction tool for that clinical scenario and to use the estimated risks it produces without labeling anything negative or positive.


Thanks Frank! My main intention was to create a verbal, clinical analogy so I would be better able to explain your issue with alpha to my colleague physicians. Sorry that my intention wasn’t clear.

Of course it would be better to discuss it in proper probabilities. However, trying to transform an entire clinical department into a Bayesian reasoning machine is (at least for me) not going to be successful overnight. Verbal analogies help with priming our clinical brains for future truly-Bayesian reasoning.

I’ll try to sharpen my analogy a but more, but it kind of reassures me that you do not immediately shoot a hole in it.


That does not need any Bayesian interpretation. It is only a matter of the reference population. The statements “1 in 20 false positives” gives the MRI refers to the population of people not having the disease. But when, in clinical practice, MRIs are done only in people already showing several signs of the disease, the reference population here is one with a considerable proportion of diseased people. Running the test on such a population will give you less than “1 in 20 false positives”. This can all be explained with a pure frequentist interpretation of probability.


Yes, but the difference between the two reference population is part of Bayes’s Theorem: false positive MRI in non-diseased people is the likelihood P(test = 1|D = 0); false positive MRI in people suspected of disease refers to the posterior probability P(D = 0| test = 1) in patients with a high prior probability due to signs, symptoms, CT.
Hence I was referring to Bayesian reasoning, which is very compatible with frequentist stats. I was not referring to Bayesian stats.


Yes, that’s Bayes theorem. But applying Bayes theorem (to go from one population to another) is not Bayesian reasoning. It’s still about frequencies of events, not about probabilities of a subject being diseased.

I think, in practice most clinicians are Bayesians “by heart”, as they think in terms of “what is the probability that this patient has the disease (given this and that)?” rather than “what is the probability of a (randomly sampled) patient having the disease (given this and that)?” (the latter being a frequentist-like question). As long as the probability statements refer samples from a population, the philosophical background remains frequentist. Only when probabilities are assigned to features that can not be subject to sampling, the thinking is Bayesian.


On the other hand, the false positive probability is

P(Δ=0 | assert Δ≠0)

This false positive probability is arbitrarily different from type I error, and can only be obtained from a Bayesian argument.

How is the condition, “assert Δ≠0”, expressed in a bayesian argument? Simply by reporting a non-zero summary statistic of the posterior distribution of Δ? For example, would reporting that Δ≠0 because the mean, or median, or mode of the posterior of Δ constitute an assertion that Δ≠0?


Excellent question. Most people believe that the state of knowledge/ignorance is continuous, so don’t place any special meaning on \Delta=0 for the prior distribution. When this is the case, the Bayesian posterior probability that \Delta = 0 is zero so P(\Delta \neq 0) = 1 so no data are needed. So instead of this, the majority of Bayesians compute P(\Delta > 0) as the degree of belief in efficacy in the right direction, and one minus that is the probability of ineffectiveness or harm. In other words, Bayesians avoid point null hypotheses. Clinical relevance involves P(\Delta > \epsilon) for some clinically relevant minimal efficacy \epsilon.

The beauty of probabilistic thinking is that until we go all the way and define a utility function to optimize in a formal decision analytic framework, we don’t need errors or assertions; we can just use the language of uncertainty to make our best statements, e.g., “Treatment B probably (0.96) lowers blood pressure when compared to treatment A, given the prior distribution for efficacy of …”


Thanks! That is a lot more satisfying form of inference.