Thanks Erin. Without specific examples of “crying wolf” I can’t say much about that, especially its prevalence and how that compares to its opposite, suppression of concerning evidence; nor can I judge the cases for myself. Regardless, while pre-specification can provide a logical foundation for statistical claims and assurances against bad practices like P-hacking, it provides little practical protection against investigator bias.
One reason is that surveys have found studies claiming pre-specification rarely have their actual activities checked against the pre-specification. Then, when they are checked, serious discrepancies seem common; usually, the authors deny those deviations are of consequence, but others may disagree (especially if they disliked the reported results). A recent highly publicized example is discussed at https://statmodeling.stat.columbia.edu/2024/09/26/whats-the-story-behind-that-paper-by-the-center-for-open-science-team-that-just-got-retracted/
Other reasons pre-specification is not particularly effective against investigator bias is that skilled experts can design a pre-specified protocol that minimizes chances of undesired results and maximizes chances of desired results. In fact, to some extent such design optimization around desirable results is expected for both ethical and practical reasons. For example, RCTs will exclude patients thought to possibly be at high risk of adverse events or unlikely to benefit from the treatment. These exclusions can result in valid estimates of event rates for the exclusive RCT group, but produces underestimates of adverse events and overestimates of benefits in general patient populations.
Less beneficently, certain groups specialize in generating negative studies of harms from environmental and occupational exposures by using pre-specified protocols that seem well powered but aren’t. This can be done by calculating power with no accounting for measurement errors (as is the norm) while using noisy measures of exposure; this ensures low actual power to detect any reasonable effect size. Then of course the resulting p>0.05 is reported as “no association”.
That said, I am in complete accord with your closing paragraph; after all, I have been going on about these “human factors” for many years, e.g., Greenland S, 2012. Transparency and disclosure, neutrality and balance: shared values or just shared words? Journal of Epidemiology and Community Health, 66, 967–970, https://jech.bmj.com/content/66/11/967. But questions remain about concrete ways to deal with the problems. I am all for pre-specification, but it’s worth little if it is not pre-registered and then submitted and published as a key supplement along with the report of results; even then, it can only address a narrow set of biases.
Conversely, absence of pre-specification should not be taken as evidence that a bias occurred (as is often done to attack undesired results). Its absence only indicates that some avenue for bias was left open, which is a source of uncertainty about the results for those who don’t trust that the authors followed good practices.