Since no one has yet taken a shot at responding, I’ll offer the following thoughts on how to decide this type of question in a principled fashion.
I don’t see a sharp distinction between “causal” models vs “statistical” models. For the purposes of this discussion, I will take the position of Seymour Geisser, who explored Bayesian predictive models sometime in the 1970’s (if not earlier).
From his book Predictive Inference: an Introduction:
Blockquote
Currently most statistical analyses generally involve inferences or decisions about parameters or indexes of statistical distributions. It is the view of the author that analyzers of data would better serve their clients if inferences or decisions were couched in a predictivistic framework.
In this point of view, “parameters” are useful fictions to help in the development of prediction models.
If this is agreed to, that places both “causal” models and “statistical” models within the underlying framework of information theory, where they can be compared on the accuracy of predictions to actual observations.
In this predictive framework, there is training data and test data. Bias in the predictive framework can be seen as function of model complexity. Simple models will not fit the training data all that well.
But complexity is not always a good thing. One can always fit a function to a data set, but there is no value to the fit, as it doesn’t predict future observations well.
In my copy of Elements of Statistical Learning the authors write:
Blockquote
Training error tends to decrease whenever we increase model complexity [reduce bias my emphasis]…However, with too much fitting, the model adapts itself too closely to the training data, and will not generalize well (have large test error). In that case the predictions will have a large variance. In contrast, if the model is underfit (high bias) again resulting in poor generalization.
This goes contrary to your statement in the other thread:
Blockquote
My personal view is that trading off bias for precision is “cheating” and gives you uninterpretable inferential statistics.
Models that engage in this trade lead to predictions of future data. I don’t know how to interpret a model other than its predictive capabilities.
In my predictive inference philosophy, a good causal model can be expressed as constraints on probability distributions that reduce model complexity, or lead to better predictions relative to model complexity. Complex models in essence “pay for themselves.”