Language for communicating frequentist results about treatment effects

Sander · November 18, 2018, 12:48am

Zad: In that situation how about reporting the decision rule with α in the methods section, then the P-value and decision in the results, as recommended (for example) by Lehmann in the mid-20th century bible of NP theory, Testing Statistical Hypotheses? Many authors in that (my grandparents!) generation used the term “significance” to describe p (Fisher, Cox) and others (like Lehmann) used it to describe α, so presciently enough Neyman when I knew him avoided using the word for either. I avoid it, Frank avoids it, and anyone can: p is the P-value, α is the alpha-level, and the decision is reported as “rejected by our criterion” or “not rejected by our criterion” (please, not “accepted”!). Even in that decision situation, Lehmann, Cox and many others advised reporting p precisely, so that a reader could apply their own α or see how sensitive the decision was to α.

As for “uncertainty interval”, sorry but no, because:

that is already used by some Bayesians to describe posterior intervals, and rightly so because
it’s yet another misuse of a word: “uncertainty” is a subjective observer state, whereas the P-value and whether it is above or below α (as shown by the interval for multiple points) is just a computational fact, regardless of anyone’s uncertainty; in fact
no interval estimate I see in my field captures anywhere near the uncertainty warranted (e.g., warranted by calibration against a credible model) - they leave out tons of unknown parameters. That means those “left out” parameters have been set to default (usually null) values invisibly to the user and consumer. (In my work there is typically around 9 unknown measurement-error parameters per association; only on rare occasions do even a few get modeled.)

At best a compatibility (“confidence”) interval is just a summary of uncertainty due to random error alone which is to say the conditional uncertainty left after assuming the model used to compute the interval is correct (or wrong only in harmless ways). But that’s an assumption I would never believe or rely on in my work, nor in any example from Gelman’s area.

My conclusion is that using “uncertainty” in place of “confidence” is just substituting a bad choice for a horrific one.