How to interpret “confidence intervals” in observational studies

ESMD · August 21, 2025, 3:11pm

“In short, clinical journals serve not only to disseminate perfect knowledge but to guide practice in the face of uncertainty. Excluding well-designed observational work until an undefined threshold of “definitiveness” is achieved would create an evidence vacuum that clinicians would be forced to fill with anecdote, opinion, or commercial interest. The alternative—publishing and critically appraising observational studies as part of the evolving evidence base—is both more realistic and more aligned with the needs of patients and practitioners.”

I disagree with this view, except in specific clinical contexts. Post #13 of the Table 2 Fallacy thread Table 2 Fallacy?: Association of 5α-Reductase Inhibitors With Dementia, Depression, and Suicide describes situations in which I think it can be reasonable for physicians to allow less-than-definitive observational evidence of treatment harm to influence their clinical decision-making. A high-profile example of a situation where clinicians should act on observational evidence of treatment efficacy is masking to reduce the spread of airborne disease (but efficacy here was established primarily by aerosol scientists and also involved experimentation).

Putting aside the niche contexts described above, the critical question is whether, in most cases, patients will be better off if their physician allows observational evidence to guide clinical decision-making. In the case of studies showing weak harm signals, which overstate the certainty of their findings, I feel strongly that the answer is “no,” provided that the clinician is making a concerted effort to apply treatments with well-established efficacy.

I could rattle off an arms-length list of potential “harms” of medications I prescribe every day, as identified through observational studies (often by researchers who have fashioned themselves cushy careers by dredging the same administrative database over and over). But if I did that in front of my patients, none of them would want to take statins, PPIs, antidepressants, vaccines, BPH medications, or any type of painkiller. And if this were the case, then I expect that the morbidity and mortality among my patients from MI, CHF, ulcer, GI bleed, infectious disease, urinary retention, acute kidney injury, self-harm, unemployment, and family dysfunction would be much higher.

Clinical journals that hype studies showing weak harm signals are effectively suggesting to clinicians that they should routinely allow highly uncertain evidence to guide their clinical decision-making. But this stance implicitly assumes that physicians don’t have a good rationale for the prescriptions they write - an assumption that’s totally off-base in most cases. Are some medications prescribed inappropriately? Yes! Should all physicians constantly strive to improve their prescribing practices? Yes! But physicians who routinely prescribe inappropriately are NOT likely to be the ones who follow the medical literature (so attempts to “scare” them into doing better by publishing lists of potentially catastrophic consequences of their prescribing are likely to fall on deaf ears anyway…).

I use UptoDate every single day in the office, during the care of my patients. It’s a great resource which summarizes the evidence base for most common medical treatment decisions. I can honestly say that the observational studies described in UptoDate almost never meaningfully influence my practice. I do acknowledge, however, that they might occasionally dissuade me from certain types of “off-label” prescribing. Maybe this approach makes me a bad doctor (?) I don’t know…

I’m not “anti”- observational evidence- far from it. I think observational evidence is indispensable for the purpose of disease surveillance, describing populations, and for finding important, unanticipated strong signals that there can be adverse long-term effects of certain exposures (e.g., vaginal cancer and in utero exposure to diethylstilbestrol). Many people can die if we don’t have good descriptive epidemiologic evidence. I don’t know anything about developing prediction models, so I can’t comment on this application. But I feel strongly that observational studies with obviously causal aims (though this goal is often not stated explicitly) which involve small effects (in relative terms), simply have too much inherent uncertainty, in most (but not all), contexts, to (safely) influence clinical decision-making.

Pavlos’ terrific triangulation of the evidence for a link between vigorous exercise and development of renal medullary cancer among patients with sickle cell trait is an example of the type of observational evidence that I think clinicians should perhaps act on, because, in this specific clinical context, there’s arguably little to no downside to doing so (other than to limit, somewhat, the spectrum of exercise options we recommend for people with sickle cell trait). So too is the painstakingly-triangulated evidence used to show the relationship between EBV infection and subsequent risk of developing MS. These are fantastic examples of the value of observational evidence. But it’s the amount of work that went into establishing these relationships that renders them orders of magnitude more compelling, from a clinical standpoint, than the vast majority of observational studies we see being hyped today in clinical journals.