Table 2 Fallacy?: Association of 5α-Reductase Inhibitors With Dementia, Depression, and Suicide

Hi everyone,

I’m new to the forum but have learned a lot from reading many of the discussions so far.

A post here about baseline covariate adjustment led me to read the “Table 2 Fallacy” article, and those concepts quickly came to mind after seeing a headline for a study published in JAMA Open today. The paper reports an association between common BPH medications and psychiatric disease/cognitive decline.

Most of the discourse from other clinicians seems to show that people are interpreting the results as causal effects of the medication on eventual cognitive decline.

After reading the methods and supplementary materials showing model results, I have to say I’m a little confused. There are a large number of baseline covariates they include in the multivariate model, and they present tables with unadjusted and adjusted estimates. However I’m not seeing any description of model evaluation or why these other covariates were included.

My questions are:

  1. If you were one of the authors, is this a situation where you would include a DAG in the publication? I am personally working on a project where I would like to include a DAG in any articles that come from it because it just seems like the proper thing to do, however I’m a little anxious about it simply because I don’t seem to see them in my field’s journals (radiology). Is including a DAG for a study such as this something that is increasing in practice? Are there any good recent examples of DAGs being published in more “clinical” research journals as a standalone figure?

  2. What descriptions of the modeling strategy do you think are missing from the methods (if any)?

  3. Are the resulting estimates for the primary exposure (BPH medication use) at all convincing for a total effect on the outcome?

I’d greatly appreciate any resources that come to mind that could help me answer these questions myself!


While this may still be true, I do not see how this data set supports the causal hypothesis, compared to the idea patients prescribed these drugs simply undergo more screening for these conditions. Nor do I see this as sufficiently strong as to eliminate the drug hypothesis from consideration.

The authors treated drug exposure as binary:

Using Cox proportional hazards regression models, we conducted 2 sets of analyses. First, to assess the overall association, an unadjusted model was fitted including a categorical variable with 5 levels (unexposed, finasteride, dutasteride, α-blockers, and combination of 5-ARIs and α-blockers). In the adjusted model, the following variables were included: year of start of follow-up, hypertension, obesity, type 2 diabetes, lipid disorders, and time exposed to each drug.

Shouldn’t any causal hypothesis present a dose-response relationship, or at least show that exposure is not confounded by increased screening?

There are some other issues with the reporting in the abstract vs. the data tables that suggest some questionable interpretation of the stats.

Related Papers

Rosenbaum, P. R. (2015). How to see more in observational studies: Some new quasi-experimental devices. Annual Review of Statistics and Its Application, 2, 21-48. (PDF)

Greenland, S. (2005). Multiple‐bias modelling for analysis of observational data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 168(2), 267-306. (PDF)

1 Like

As a family physician with a large number of elderly male patients with cognitive impairment and/or BPH, I second your concern about the absence of a DAG here. I suspect that if a DAG had been prepared with extensive input from physicians, the authors might have had second thoughts about proceeding with the study.

As is true for most database studies published in clinical journals, clinicians are left asking: “What am I supposed to do with these results?”

Let’s imagine that the association of BPH medications with dementia had not attenuated with longer followup. What would the authors suggest that I should be telling my patients? The options are:

  1. Nothing- don’t even mention the study. The results are insufficiently reliable to influence patient care and will leave patients who need treatment for their BPH in a state of perpetual anxiety if they decide to accept medication;
  2. Tell the patient: “To be completely transparent, I must notify you that observational database studies suggest that the medications available to treat the urinary hesitancy and frequent nocturia that are making your life miserable might increase your risk for dementia. You should “keep this in mind” when deciding whether you want to treat your symptoms…”
  3. Tell the patient: “To be completely transparent, I must notify you that observational database studies suggest that the medications available to treat the urinary hesitancy and frequent nocturia that are making your life miserable might increase your risk for dementia. But these types of studies are insufficiently reliable, so I don’t think that you should pay too much attention to them;”
  4. “Database studies suggest that these medications pose a risk for dementia, so we’re not even going to bother trying to use medications to treat your symptoms. I’ve decided to start referring all of my patients with BPH to a urologist for consideration for TURP.” (sarcasm intended)

I’d love to see a poll of primary care physicians and urologists, asking them which of the above options they would vote for. I suspect #1 would win, with #3 a distant second. Which begs the question: “Why to researchers keep running these types of studies?”…

As is true for so many of these ostensibly “causal” (yet DAG-free) observational studies, the authors seem to be adding yet another study to a long line of conflicting prior observational studies on the same topic. In the discussion, they state: “Last, a higher risk for dementia was also observed among individuals undergoing androgen deprivation therapy. However, research on DHT and dementia (even more on 5-ARIs) is scarce, with contradictory findings, and thus more research is clearly warranted.” My question is WHY should we continue to invest healthcare dollars in observational studies on these questions, if there’s no evidence, to date, that the results of any previous database studies on the issue had ANY impact on clinical practice?

I think clinicians and governments should start sending bills to JAMA every time it publishes one of its database studies suggesting a “link” between some widely-prescribed medication and a universally-feared health outcome. I shudder to think how much money these types of studies cost the healthcare system, as patients flood doctors’ offices looking for reassurance. The alternative outcome is worse though- patients whose quality of life has been unnecessarily ruined because they either don’t accept treatment for a highly symptomatic condition or else accept treatment and then live in perpetual fear of a dreaded adverse outcome…


Thank you for the insight and references. I agree that I do not see anything in this article to convince me either way. Indeed the relationship would be dose-dependent. Adherence is also not measured. There is just a lot of important missing information here. Overall seems like a wasted effort that adds confusion to the mix on the part of clinicians, and potentially harming patients.

DAGs are becoming progressively more accepted in medical research as evidenced by this month’s BMJ excellent methodological overview recommending them as a way to explicitly represent the underlying causal networks. JAMA also recently published a short overview of DAGs. We also provide here practical examples in urologic oncology (including of the Table 2 fallacy).

There are of course still peer reviewers, even as of last month, asking why we used DAGs as opposed to propensity score matching. But things are rapidly improving and many in our group of oncologists and statisticians have become adopters of such tools as shown here, here, and here with more to come.


Thank you for providing these! Great to know!

1 Like

No prob. To clarify, because this was asked offline: DAGs are helpful regardless of whether one chooses to use propensity score matching (PSM) for the problem at hand. What the peer reviewers are likely trying to understand is why we used, for example, standard multivariable outcome regression instead of PSM. Good discussion on pros and cons in this datamethods thread.

Another recent peer reviewer comment was that because DAGs are not a routinely used approach we should explain their differences from more standard methods such as decision trees. I literally had to google what decision trees are supposed to be in this context…

This post addresses only the data in this paper concerning dementia, Alzheimer’s Disease (AD) and vascular dementia—the dementia outcomes.

To my mind, the most important threat to the validity of a conclusion that these medications CAUSE an increase in the likelihood of developing dementia, vascular dementia, and/or Alzheimer’s Disease (AD) is bias that arises because of more complete, or earlier, detection of cognitive impairment/dementia in men who receive a prescription for one of the medications because they seek care and are diagnosed with BPH and/or androgenic alopecia. The authors call this “surveillance” bias but it could equally be called detection bias. The authors mention the possibility that surveillance bias may have affected their results but seem to dismiss it as an explanation for the observed associations.

I believe dismissing surveillance bias as an explanation for the findings about dementia is a big mistake. Neither multivariate analysis nor propensity score matching using any set of covariates will eliminate this bias if it exists.

A DAG would not help if this critical factor affecting a causal interpretation of the observed association is ignored or dismissed.

Putting aside the almost intractable problem of surveillance bias, the variables included in the multivariate analysis should be selected because they are confounders or potential confounders–that is, they affect both the outcome and the chances that the medication was prescribed. Their selection should be based on a deep understanding of the literature about risk factors for the dementia outcomes. In my opinion, the authors should have explained in detail why the variables selected for adjustment and for the propensity score were chosen. Variables that are confounders or potential confounders that were considered for the adjustment but not used should have been identified. The authors should have included citations to relevant prior epidemiologic research to justify their choice of variables.

The variables used appear to be, with the exception of “eating disorder,” well-established risk factors for vascular disease or markers of vascular disease: beta-blockers (as a marker of hypertension or CAD), type 2 diabetes, obesity, hypertension, lipid disorder. Other than genetics and age, the established risk factors for AD are cerebrovascular disease, Type 2 diabetes, hypertension, obesity and dyslipidemia (Mayeux and Stern 2012).

It is difficult to distinguish vascular dementia from AD reliably using computer-stored administrative data. Also, as seen in this study, many people with a diagnosis of dementia do not have a specific diagnosis of either vascular dementia or AD. Thus, in the unexposed men in the cited study, 53,0275 had “dementia,” 15,085 had AD, and 10,504 had vascular dementia. Given the overlap in risk factors between vascular disease, cerebrovascular disease, and AD, the risk factors used in the multivariate analysis (and the propensity score matched analysis) EXCEPT eating disorders seem reasonable. The inclusion of eating disorders in the adjustment considering the dementia outcomes, in my opinion, bewildering (and probably unnecessary given its low frequency).

A DAG would perhaps have clarified the authors’ thinking about the choice of variables included in the multivariate analysis and in the propensity score matching exercise. But only if the adjustment / propensity match was more than just a “kitchen sink” adjustment for all the variables in the dataset that are risk factors for vascular disease and/or AD. In my opinion, even a “kitchen sink” adjustment (put in everything that increases the risk of vascular disease and/or AD not worrying about pathways) should have included a diagnosis of cerebrovascular disease/stroke and coronary artery disease since both are strong (causal) risk factors for vascular dementia, AD, and dementia that is not specified as vascular dementia or AD.

Note also that exercise (more) and diet may be risk factors for AD (Mayeux and Stern 2012) and perhaps also for vascular dementia and dementia not specified as vascular or AD. Information on these variables is almost never available in datasets like the one used in this analysis and is potentially a source of residual confounding. Unmeasured variables as a potential source of uncontrolled confounding should have been given more emphasis in this publication.

The OP asks:

Are the resulting estimates for the primary exposure (BPH medication use) at all convincing?”

Given the unexcluded possibility of surveillance bias, the small overall magnitude of the effect size estimates, the existence of known potential confounders not included in the adjustment, and the possibility of unmeasured confounders, my vote concerning the dementia outcomes is a resounding “NO.”

Literature Cited

Mayeux R, Stern Y. Epidemiology of Alzheimer Disease. Cold Spring Harb Perspect Med. 2012;2:a006239.


I think that the authors of the study linked in the original post agreed with you. They did note that surveillance/detection bias seemed to be a likely explanation for their findings related to cognitive impairment (since the association seemed to attenuate over time). They concluded that there could be a signal for depression but that they could not corroborate the signal for cognitive impairment that had been suggested by some previous studies.

The broader question is why researchers think that database studies will be able to generate a reliable list of drug-related “risk factors” for cognitive impairment in the first place (?) Clinically, defining the timing of onset of dementia (and often depression) with any degree of assurance is usually a lost cause- these diagnoses are very often insidious. Elderly patients who attend their physician frequently might be more likely to have their cognitive impairment detected sooner than those who attend infrequently. And those who attend more frequently are also more likely, in general, to wind up with various prescriptions just before their cognitive impairment is detected/diagnosed. But even those who do attend regularly often “fly under the radar,” with their cognitive impairment undiagnosed until it is fairly advanced. This is especially true for patients whose office appointments tend to be short or who live alone or don’t interact with friends/family members who might notice a problem and mention it to the physician. And added to the clinical difficulty (?impossibility) of identifying “date of onset” of cognitive impairment is the highly questionable utility of diagnostic codes to accurately reflect the date that the diagnosis was first suspected.

So, my question is, why do researchers keep running these types of studies? The only answer that I can think of is that they don’t routinely consult with very many (or any) clinicians before they decide to forge ahead…

1 Like

There are many things that may not be well encoded by topological structures like DAGs. Good examples here. However, this “surveillance bias” or “detection bias” appears to be a form of “measurement bias” also known as “information bias”. This excellent overview uses DAGs to unpack the analytical and interpretational implications of these types of biases. In particular, take the DAG shown on Figure 2D. Variable A is the exposure, which in this case is whether medications where actually used. A* is the recorded drug use in the dataset. This variable is used to denote that sometimes recorded medication use will not necessarily correspond to actual medication use. U_A is the measurement error causing the difference between A and A*. Variable Y is the outcome, in this case actually developing dementia, vascular dementia, and/or Alzheimer’s Disease (AD). Y* is the recorded outcome in the dataset and U_Y is the measurement error causing the difference between Y* and Y. What we are truly interested in is the effect of A on Y. However, we need to account for this “detection bias” representing the influence of A on U_Y and subsequently on Y*.

The above structure would imply that this particular measurement bias is independent but differential. As discussed in the linked overview, this implies that f(U_Y, U_A) = f(U_Y) f(U_A), where f(·) is the probability density function, and that the magnitude or direction of the error is different for individuals who have the outcome compared to those without the outcome. If we have reason to believe that the causal structure of the measurement bias is different then DAGs could again help represent our assumptions and accordingly devise our analysis plan and interpretations.


An important point. If any variables were excluded due to looking at the data, that is a significant problem which can result in under-adjustment.

I tend to not trust an observational treatment comparison unless the authors took the major prior step of surveying lots of experts to determine their opinions about potential confounders, and made sure all of those are in the dataset.


So, my question is, why do researchers keep running these types of studies? The only answer that I can think of is that they don’t routinely consult with very many (or any) clinicians before they decide to forge ahead”

Why? Perhaps because the authors have access to observational data that CAN be used to address the question of medications to treat BPH and the risk of dementia.

The JAMA Open analysis was done in the context of at least three other published observational studies that used data gleaned from submissions for payment, filled prescriptions, and/or “registries” that assess the use of medications to treat BPH and the risk of dementia (citations below). These studies had mixed results.

If the JAMA Open analysis could have eliminated the possibility of surveillance bias and if the analysis had accounted for all known potential confounders AND if the results had been unequivocally null (no increase or decrease in the risk of dementia), the study might have help lay to rest the hypothesis that medications to treat BPH cause dementia.

But the analysis did not eliminate surveillance bias and important known potential confounders were not considered. The results for dementia were not unequivocally null. The authors are left to postulate, but cannot prove, that surveillance bias or residual confounding might explain their not-null findings.

In addition to consulting with clinicians, there might have been a consultation with a broad group experts in dementia epidemiology, as Frank Harrell has suggested.

There are many other databases like the one used in the JAMA Open publication and those used in the studies cited below that gather information on medications and diagnosis codes. I anticipate more observational studies about medications used to treat BPH and the risk of dementia based on these databases. One might hope that this discussion prompts consultations that improve the selection of confounders and reflects a clinical perspective when interpreting results. Maybe this discussion will discourage analysis of data in which surveillance bias cannot be eliminated and discourage use of databases that do not have high quality information on known potential confounders. Maybe we’ll get a DAG.

Other Published Observational Database Studies that Assess Medications for BPH and the Risk of Dementia (NOT a Systematic Review)

Duan Y Grady JJ, Albertsen PC, Helen Wu Z. Tamsulosin and the risk of dementia in older men with benign prostatic hyperplasia. Pharmacoepidemiol Drug Saf. 2018;27:340-348. doi: 10.1002/pds.4361. Epub 2018 Jan 9. PMID: 29316005.

Tae BS, Jeon BJ, Choi H, et al. α-Blocker and risk of dementia in patients with benign prostatic hyperplasia: A nationwide population based study using the National Health Insurance Service Database. J Urol. 2019;202:362-368. doi: 10.1097/JU.0000000000000209. Epub 2019 Jul 8. PMID: 30840545.

Latvala L, Tiihonen M, Murtola TJ, et al. Use of α1-adrenoceptor antagonists tamsulosin and alfuzosin and the risk of Alzheimer’s disease. Pharmacoepidemiol Drug Saf. 2022;31:1110-1120. doi: 10.1002/pds.5503. Epub 2022 Jul 6. PMID: 35751619; PMCID: PMC9542191.


Thanks. I agree with everything you’ve said. For the reasons below, I hope that drug safety research undergoes a major reformation- and soon.

For causal observational studies focused on drug safety, researchers need to consider the impact of their work on two audiences- prescribers and patients. In turn, the impact of the research will be a function of both the potential benefits of the drug in question, and also the reliability of the identified safety signal. The greater the benefits of the drug for the patient and the less reliable the safety signal, the lower the impact of the research will be.

Observational drug safety studies can sometimes change clinical practice- but only in certain contexts. Good stewards of research funding would ask prescribers, before designing a study, what the impact might be, if any, on clinical practice, given a spectrum of potential study outcomes.

Prescribers are much more likely to pay attention to observational studies suggesting a possible safety signal when:

  1. There is a combination of: marginal drug efficacy+strong safety signal+serious AE; OR
  2. The signal is occurring in the context of “off-label” drug use. In this setting, solid evidence of drug efficacy might be missing and prescribers’ threshold for being swayed by even a weak potential safety signal is much lower; OR
  3. The signal suggests that one member of a drug class, or one therapeutic option among many efficacious options for a given condition, might pose a higher risk than other options.

Doctors are much LESS likely (and often UNlikely) to pay attention to observational studies finding weak associations with safety-related outcomes when the drug has well-demonstrated efficacy AND is being prescribed according to its approved indication AND when one or more of the following is true:

  1. The condition being treated has few other therapeutic options; and/or
  2. The condition being treated is highly symptomatic; and/or
  3. The drug can prevent important disease.

As you’ve noted, “duelling” observational studies are common in medicine. Safety studies that find more pronounced adverse treatment effects are more likely to be published (at least at first) than those that don’t (“winner’s curse”). Authors of these early studies run to the media with their “discoveries” and patients get scared. Over the next few years, other studies are published that usually report much smaller effects, if any (with these authors running to the media to allay fears created by the earlier authors).

By now, the above sequence of events is maddeningly predictable for clinicians. As a result, we have become largely inured to these types of studies. Unfortunately, we continue to spend an inordinate of time “talking patients down” after they read sensationalized reports in the media. Maybe, if “gold standard” methods for causal observational studies were followed more widely (see the Causal Inference thread), this aggravating cycle would be broken and drug safety research might regain some of its lost credibility…

In short, since observational studies are often considered suboptimally reliable, prescribers are much more likely to allow their results to affect decision-making in clinical scenarios where potential drug benefits are also less clear. Unfortunately, most patients aren’t as well-equipped as prescribers to weigh the risks and benefits of treatments and to gauge the reliability of published research (though many patients are capable in this regard).

Drug safety researchers who scour observational databases to identify “associations” between commonly-used drugs (e.g., PPIs, SSRIs, statins…) and common, serious diagnoses (e.g., cancer, dementia,…) are sometimes also highly invested in deprescribing efforts. They seem to feel that many physicians are too quick with the prescription pad and might be underestimating potential treatment harms. I’m sympathetic to this view, but only to a point. While some physicians might indeed be reckless prescribers, most are probably keenly aware of the potential harms of medications (and especially polypharmacy). To this end, most will try to ensure a solid rationale for the prescriptions they write. They will prescribe only when a patient is either suffering or is at risk for an adverse outcome without treatment. This is why the seemingly endless media parade of methodologically weak drug safety studies is so frustrating for physicians. Publishing studies that are likely to have zero impact on prescribers, but a potentially huge psychological impact on patients, is cruel.

The “causal pie” concept of disease development likely underlies much risk factor epidemiology. But if the 49 pieces of the proposed causal pie for some dreaded but poorly-understood disease include off-target effects of 15 medications that can otherwise potentially improve patients’ quality of life or life expectancy, then maybe it’s not such a good idea to publicize every potentially new slice of the pie (especially when history suggests that most pie pieces shrink rapidly over time with additional study)…

In summary, prescribers understand that the RCTs used to show drug efficacy for regulatory approval will not always identify less common adverse effects or those that only manifest after many years of exposure. However, we are frustrated by highly-publicized observational studies that focus on weak (and often inconsistent) safety signals that involve efficacious treatments that might otherwise substantially improve patients’ quality of life and/or prognosis. We become even more frustrated when those studies have not applied “gold standard” practices for making causal inferences from observational data. Unfortunately, these practices require considerable effort and rarely seem to be followed. Journals that publish research that doesn’t follow these guidelines and which doesn’t put possible drug-related risks into proper perspective (by considering potentially important benefits of those same drugs) are incentivizing suboptimal research practices and harming patients.


Yes, thank you, but not only consider but also write a summary for the patient audience. It need not be long. It needs to be done well - in plain language - routinely.


I am currently trying to understand the “table 2 fallacy” and I came accross this paper today:

I am not sure if the authors’ trial to interpret the association between Liver fibrosis (FIB-4 score) and the risk of symptomatic Intracranial Haemorrhage (SICH) with the common risk factors could have protected them from making the fallacy?! or it’s just another example for it?

It looks like this paper cant decide if it wants to predict, explain or associate factors with outcomes. I found this recent blog post very useful:

These Are Not the Effects You Are Looking For (


An excellent paper, also serving as a good example of Bayes and of reproducible reporting with nice formatting.

1 Like

I found the last paragraph particularly compelling. In my field (veterinary medicine) so called risk factor studies are very popular. They are not designed as predictive models but they also dont consider DAGs and approach the study as a true causal model. So in my opinion they simply end up with models containing variables that were determined by an algorithm. However, most clinicians interpret these risk factor studies as causal. The cause of this may be due to what the author wrote in the final paragraph:

Of course, encouraging researchers to improve their own practices is only half the battle because bad habits and logical fallacies are learned behaviors oweing to the reality that graduate-level statistics in the social sciences is often taught entirely independent of any meaningful causal foundation. Students are instructed to interpret everything with the justification being that “they need experience in interpreting coefficients/marginal effects.” Yet, this has the unintended consequence of instilling in them that such an approach should be taken in their own work and they then go on to teach their future students those same poor practices. Before long, this results in entire fields in which presumed knowledge rests upon castles of sand and hinders scientific progress


Even though interpretation of individual coefficients in Table 2 can be problematic, I feel that it is still useful to show how the explained variation in outcome Y partitions according to the baseline covariates. This can be done using relative \chi^2, R^2_\text{adj}, or relative R^2 for example. The relative measures provide estimates of the proportional of explainable variation in Y that was explained by each covariate, holding all the other covariates constant. Just don’t label this as causal.

1 Like

[Bold added.] We may need to use words like explainable (part of the professional argot, I know) with more careful qualification if we hope to avoid undesirable leaps to causal labeling. In ordinary usage, I think most people — scientists and laypersons alike — expect causal content in a proper ‘explanation’.