I do appreciate those concerns. In the stuck-between-rock-and-hard place that Tim (and others, including myself) may find ourselves in - a paper returned with a reviewer comment that wishes to dismiss a finding of “no relationship” by stating “your study must be underpowered, please include a post-hoc power calculation” - do you consider the approach suggested above at all reasonable?
I can appreciate the hand-wavy nature of power calculations before the study as well, but surely you see some value in estimating the number of patients needed for something - even if not a dichotomous decision rule, the ability to estimate an effect within some amount of precision? I am often concerned about the accuracy in the power calculations that I provide to collaborators, but I would prefer that they be based on something rather than simply recruiting whatever number of patients they think is enough (they’d almost always underestimate, I suspect).
I really like this example. Helps me (non-expert) to better understand the concepts being discussed. How similar is this to doing a post-hoc (conditional) equivalence test with a minimally important difference?
My suggestion is also always look to the compatibility intervals! That would be a far more fruitful effort than trying to determine whether your study was underpowered or not. It is often difficult to know whether the reason you have a nonsignificant finding is because of a true indistinguishable difference or because your study was underpowered. A way to reduce your uncertainty about whether the test hypothesis is really true is by utilizing equivalence testing (if you must test hypotheses) or you could attempt to plan studies based on precision so that the computed compatibility interval or (a function of it) is closely clustered around certain values.
Unfortunately, it seems that the issue of post-hoc power / “observed” power just won’t die. Apologies to those that follow me on Twitter who are already aware of this, as I’ve been hammering this article for the last day or two, but a group of surgeons have been promoting the nonsensical idea that all studies which report negative results should include a post-hoc power calculation using the observed effect size, an idea which the statistical community has written a number of articles about over the years:
If you read this article and are as troubled as I am, I encourage you to also leave a comment at PubPeer. This article is wrong about nearly everything it says (the initial premise that “absence of evidence is not evidence of absence” and giving more thought to power/sample size, I am fine with; however, the entire applied portion is wrong as well as the subsequent recommendation that all studies be reported with an “observed” power, for reasons laid out in my PubPeer comments).
My PubPeer reply references several excellent resources which you may be interested in should you ever be pushed on this issue of “observed” power in a completed study:
I am so glad you are fighting this battle! And your point on twitter that a paper all about statistics not getting statistical peer review is really concerning.
I actually emailed the Statistical Editor of the journal directly to ask about this, and he told me that he had not been sent this paper for review and did not know about it until I showed it to him, which confirms that a clinical medicine journal published an article entirely about the practice of statistics without getting any statistical input. It’s the equivalent of Statistics in Medicine publishing a paper about surgical technique without getting any surgeons to review the paper.
EDIT: also, the reason I am hopeful more people will leave PubPeer comments is my fear that the journal will think this is just another nuanced debate which requires “both sides” to have their say, when this is not a “both sides” issue. I do not want the editor to say “Thanks for raising your concerns, we will let you publish a nicely worded 500 word letter summarizing them” and leave this exchange back and forth in the literature. The paper is flat wrong, it’s misleading, and it should be removed from the literature. Otherwise, it may lead to scores of subsequent surgery papers that showed minimal benefits of a treatment citing this paper, performing the misguided calculation of “observed power” and saying “We cannot rule out the lack of treatment effect; our study had an observed power of just 15 percent. A larger study may have shown benefit” (which they do not realize is basically the same as saying “if the treatment worked better, we would have shown that it worked”)
Thank you. The more people that chime in at PubPeer, the greater chance the editor will be inclined to realize what a flawed paper this is and that the field will be best served if it is removed from the literature. I urge all of you to consider leaving a comment with your interpretation of the article.
I realize the point is targeted toward clinical or general medical journals, but I’d also give a shout-out to Epidemiology, which has a lovely policy of considering the publication of letters that were rejected elsewhere.
Please realize that @EpidemiologyLWW is willing to consider letters that were rejected by editors of other journals, a policy first announced by founding editor @ken_rothman in 1993. Rejected letters should be submitted with evidence of the original journal’s rejection.
That is a great policy. In the case of particularly obstinate journals, well worth considering.
In the case of the post-hoc power article that I have posted about most recently, retraction (rather than an exchange of letters) is what’s desperately needed. Even if our letter(s) pointing out the flaws get published, the original paper will remain in the literature and people unaware of the flaws will continue to read it and think “Gee, what a great idea!”
I totally agree regarding the post-hoc power article you have been (thankfully) bringing attention to. That deserves retracting and I hope that is the end result.
I should have been more clear that I was suggesting submitting to Epid as a potential alternative for the letter that @zad had rejected by JAMA. Gotta work on my tagging skillz.
No problem at all. I know what you meant, and am thankful you brought it to attention, as I may utilize in the future if journals demur on publishing letters that point out major flaws in their publications. I do hope to persuade editors in some cases that “look, publishing a letter would be better for my CV, but a retraction would be better for science.”
Apologies that we are a little off-topic on “Observed Power.” Thus ends this threadjack.
I’d like to offer a challenge that might be helpful in reaching your true audience: surgeons. I cannot believe this issue is so complex as to require such a lengthy and elaborate discussion. I have to think that you could build a Shiny app, and make a 140-second (tweetable) video that demolishes post hoc power for all time. (This might make a nice project for a graduate student, who could even gain some social-media exposure out of it, plus the opportunity to develop and demonstrate statistical teaching/communication/outreach skills.)
Even surgeons and surgeons-in-training will have time for a 140-second video, especially if it criticizes some of their prominent colleagues. As an example of what’s possible in this medium, under its severe constraints, consider this 127-second video produced by yours truly.
In some ways, this has already been done (maybe not the “video” part, but blog posts with a single Figure/simulation that explain it). The problem - at least insofar as I can see - is that understanding that single Figure or simulation still requires an understanding of what statistical power is in the first place, and that absent that understanding, no single Figure or simulation gets it across. So I do not believe, or at least I’m not easily capable of meeting this goal:
I would be more than delighted if you are able to do this. I honestly don’t think that I can.
The surgeons’ interpretation is clearly wrong, but they’ve also been told that in myriad different ways by at least four different replies to their initial publication, and the response was essentially “Thanks, but we’re still right.” If you’ve got a better way to convey this, by all means, go for it.
I guess that’s throwing down the gauntlet, then, and I’ll have to look into it properly. Would you offer a link to the single, most concise—even if ‘incomplete’—treatment you’ve seen so far?
On my phone right now, but Daniel Lakens’ blog post, Russ Lenth’s article and the “Abuse of Power” article (all linked in my PubPeer comment, and all listed in the post above) all include a simple Figure and simulation that should make it clear that “observed power” is just the “p-value from a different angle” (nor is it a number that makes much sense, if one actually understands what it is).
i think the most concise and clear example is the one that says if you get pvalue=0.05 and then calculate posthoc power youll get 50%. Tell that to any clinical personal and theyll recoil
…or they’ll say “see, that’s why we need to lower the standard for power from 80 percent! Even studies with p=0.05 don’t achieve 80 percent power! From now on let’s use 50 percent power!”