DAGs for Bayesian hierarchical models?

Mzobeck · January 2, 2021, 4:10am

I primarily work on epidemiological observational data and am trying to integrate DAGs into my workflow. My main exposures to them have been Pearl’s Causal Inference in Statistics: A Primer, McElreath’s Statistical Rethinking, and a handful of methods papers. One of the areas that I cannot find a good source for is how to draw DAGs for Bayesian hierarchical models. My search of the literature yields some examples, but no clear methods reference I can lean on. Does anyone have a good reference by chance?

My main questions is it’s not clear to me how to represent random intercepts in the DAG. Should they be nodes like other variables? Random slopes make sense that those are just nodes with a specific method of statistical estimation. I’ve seen some example hierarchical DAGs that have extra structures to denote the hierarchy and nodes that represent parameters instead of variables, but it’s not clear to me if these can be interpreted causally a la Pearl’s methods.

Pavlos_Msaouel · January 3, 2021, 3:15pm

This is a great question and it seems that much work remains to be done on this topic. One good place to start is this recent paper formalizing fixed effects and random effects into DAGs motivated by a simple linear repeated measures scenario (pre and post test). @Sander may have more to say on this issue.

Sander · January 3, 2021, 5:24pm

I never saw any problem with the fixed/random effects distinction in causal DAGs. I may have missed something as I glanced only briefly at the Kim-Steiner article, but it seemed consistent with how I think of the distinction when one limits focus to linear structural relation (LISREL) models. Nonetheless, coming as I do from an area where such models are usually too implausible to be useful (violating logical range restrictions), I prefer more general formulations.

For example, suppose I have a generalized-additive structural equation for individual responses yi given covariates x and intercept level k
g(yi) = ak+f(xi)+ui
and that by “random intercept” (RI) I mean that the intercepts ak are iid draws from a distribution F(a), one for each level k of a variable with no further internal or causal structure (in epidemiology k often indexes matched sets). Then I can represent this variable as a parent node A of Y in the corresponding DAG.

The usual RI assumption is that this node has no parent, which as the Kim-Steiner article notes makes the intercept look like a randomized treatment effect. Because the article uses linear structural relations this fact corresponds to their zero-covariance (uncorrelatedness) condition. But more generally in Pearl’s NPSEM formulation this means it is unconditionally independent of its nondescendants (including a targeted treatment, if any), which is a stronger condition than uncorrelatedness and is exactly what successful randomization delivers.

Note that the traditional individual random disturbance ui is also a random effect indexed by the individual identifier i, and sometimes shown in the graph as a node u. In this sense the traditional linear model is really a mixed model, not a pure fixed-effects model. In fact arguably there is no such thing in everyday statistics as a pure fixed-effects model: every outcome Y is modeled with some random component or generation step. But in so-called FE models it is usually a zero-mean final step added to the structural expectation, so it becomes invisible when working only with expected outcomes (as in typical GLM presentations) which miss phenomena like random confounding (as discussed in the 2015 Greenland-Mansournia article cited by Kim & Steiner).

Mzobeck · January 8, 2021, 4:27pm

Thank you both for the replies. @Pavlos_Msaouel, the paper was very helpful and @Sander, that was a very clear explanation that helped solve my problem.

A follow up question: I have the intuition that, in a causal paradigm, random slopes should be seen as method of statistical estimation and has no special implications for causal identification in a DAG. For example, if an adjustment set containing variable Z satisfied the backdoor criterion, then upon estimation, the adjustment set satisfies the criterion regardless if Z is modeled using fixed or random slopes. Is this correct?

Sander · January 8, 2021, 6:46pm

Re: “random slopes should be seen as method of statistical estimation and has no special implications for causal identification in a DAG” - that is how I view the situation with ordinary causal diagrams (usually cDAGs, temporally ordered DAGs with additional mapping from a causal structure, but also SWIGs), when each slope distribution corresponds to a mean-zero prior distribution on the slope and those distributions are all independent, as in classical random-treatment experimental models.

There are however complications that arise when looking at how random-parameter models are recommended and used in practice, which I think need a more general Bayes-net (information network) and hierarchical (multilevel) effect view to understand precisely. That view can map the distribution’s hyperparameters to graph nodes, as seen in hierarchical-model diagrams going back at least to the 1980s (paralleling causal-diagram development). By imposing on those diagrams the same sort of restrictions used to get cDAGs from DAGs, we could interpret them as hierarchical causal diagrams for models that use and extract information on the causes of the targeted effects (slopes).

Consider how we used hierarchical models in Witte JS et al. (2000). Multilevel modeling in epidemiology with GLIMMIX. Epidemiology, 11, 684-688, and Greenland S (2000). When should epidemiologic regressions use random coefficients? Biometrics, 56, 915-921. In those, the nutrients multiply hyperparameters that determine the diet effects (slopes) for a breast-cancer outcome. Using an ordinary cDAG, the nutrients could be graphed as direct causes (parents) of the outcome, with the diet factors as the nutrient parents. The direct diet effects on the outcome correspond to the diet residuals. These residuals are assumed IID mean-zero by the fitted regression model.

Hierarchical graphs may clarify why standard multiple-comparisons (MC) adjustments are often complete contextual and causal nonsense. Applied directly to the diet effects without putting the mediating nutrients in the model, they correspond to treating total diet effects as IID draws from a single distribution centered at zero, which implies complete causal exchangeability for items as diverse as bacon and lettuce. Thus the causal diagram for some MC procedures would have one added exogenous node representing a known constant fixed at zero as a parent of each diet item. Each diet item would also have its own random parent with effect exchangeable with all other random diet effects. This kind of scientifically nonsensical model (which ignores information on known mediators) is also the basis of random-effects meta-analyses (MA) that fail to regress out obvious causes (modifiers) of study results (e.g., bias sources).

Such examples are why I see generalized causal modeling and diagramming as a potent tool for exposing the scientific nonsense behind traditional practices promoted by some statisticians to address MC or MA problems.