Hi Huw
Thanks for your input and for trying to translate these very tricky ideas into layman’s terms.
After all this back-and-forth, I’m starting to think that the only way to make sense of observational study “confidence intervals” is by focusing on what Chris seems to be saying in post #10. Here’s the view I’ve arrived at (maybe wrong, but it’s the best I can do):
- These intervals, meant to connote the degree of “random error” inherent in a study’s result, are “false” in the sense that their specific boundaries and width hinge on layers of assumptions that are usually not justifiable given that there is no true underlying random sampling from the underlying population NOR any random allocation that contributes to their generation;
- If we are being extremely charitable, we might find some value in these intervals if we consider them to be “best case scenarios” with regard to the degree of uncertainty- in other words, if we view them as a crude representation of the “minimum” degree of uncertainty which might apply if we had actually been able to perform random sampling from a target population in order to generate them;
- The main reason these intervals are so immensely problematic and pernicious is that, for many decades, researchers, journalists, and the public have NOT been viewing them as described in bullet point #2 above. And now it’s too late- nobody seems to be able to put the horse back in the barn;
- So how did we get here? At some point in history, these intervals, and whether or not they contained the “null,” came to be used as a filter to decide whether or not a study deserved publication. Studies with intervals that crossed the null (results that did NOT achieve “statistical significance” i.e., p>0.05) were less likely to get published than those with intervals that excluded the null (i.e., achieved p<0.05) In turn, this filtering practice had two disastrous effects: 1) to cause researchers to bend over backwards, often in highly damaging ways, to generate intervals that don’t cross the null (e.g., multiple testing/garden of forking paths/HARK-ing); and 2) to cause researchers to propagate (loudly, via the media) the idea that intervals that don’t cross the null represent important scientific “discoveries.”
- I’m not sure about the extent to which this horrible “inversion” in the interpretation of uncertainty intervals was rooted in widespread ignorance about their inferential limitations (it seems like there are many layers of assumptions involved in their generation, which many seem all too happy to simply ignore) or simple laziness (the desire to substitute “easy” work like dredging administrative databases for “hard” work involving painstaking triangulation of multiple lines of evidence) or perverse incentives (for publication and therefore career advancement). Arguably, the etiology of our current-day mess involves some combination of these three factors. The result of this 180-degree-distortion was the gradual but ultimately pervasive loss, over time, of researchers’ and research consumers’ understanding of the limitations inherent in the interpretation of uncertainty intervals;
- To summarize, there is a widespread 180-degree-distorted interpretation, among researchers and research consumers, of the uncertainty intervals presented in observational studies. The distortion began many decades ago and has been constantly reinforced over the years, by various forces, to the point where, possibly, only a small minority of researchers active today appreciate their limitations. Instead of interpreting them in a very crude and maximally conservative way (their just-barely-defensible interpretation), most researchers and readers now interpret them in a falsely precise and maximally liberal way (a completely indefensible interpretation). Specifically, instead of viewing them as a crude representation of the bare minimum uncertainty in a study’s result (a “conservative” assessment/underestimate of the degree of uncertainty, which would be the best possible result IF we were to completely suspend our judgment regarding multiple unrealistic underlying assumptions), EVERYONE started viewing these intervals as an indicator of the maximum uncertainty around a result (a “liberal” assessment of the degree of uncertainty). As a result, we have elevated the role of the single study in scientific decision-making far beyond its actual value with regard to scientific inference.