"Observed Power" and other "Power" Issues



It’s amazing how statisticians can spend 20 years writing papers that explain this, and yet the fiction of a retrospective power calculation using the observed effect size lives on in clinical journals as a thing people think is not only useful, but necessary. Even after it’s been explained to them…the response came and showed that they had learned nothing:


“We fully understand that P value and post hoc power based on observed effect size are mathematically redundant; however, we would point out that being redundant is not the same as being incorrect. As such, we believe that post hoc power is not wrong, but instead a necessary first assistant in interpreting results.”


@zad and I have just submitted a letter in reply to the surgeon’s double-down. If the journal accepts, I will link; if the journal declines, I will post the full content here, possibly with a guest post on someone’s blog as well.


I’ll have to disagree a bit. Non-plausibility of effect size can be a reason to cast doubt. As John Ionnidis published years ago, don’t trust an odds ratio above 3.0 in a genetic study.


I can’t agree entirely. If you agree that what matters is the false positive risk (FPR) rather than the p value, the estimation of the false positive risk depends om (your best estimate of) the power of the experiment. See Fig 1 in https://arxiv.org/abs/1802.04888 (in press, American Statistician).
(I guess for consistency, power should be redefined in terms FPR = 0.05 rather than p = 0.05, but that is for another day.)