Hello everyone,

Every couple of years it seems there is a wave of doubly robust boosters in my field of interest (evidence synthesis/comparative effectiveness supporting HTA submissions primarily). Does anyone have some nice recommended reading on cases where a doubly robust estimator might actually save me/be worth the extra effort in inheriting all of the workflow difficulties of treatment assignment and outcome modelling?

My knee-jerk reaction is:

- It doesn’t protect me from variable selection issues since variables needed for adjustment are the same in a treatment assignment/outcome modelling paradigm
- The applications I’ve generally seen are something like a logistic regression IPTW with a bunch of simple linear terms and then the same formula in the outcome model.
- I’m worried people will just do a bad version of both models (see number 2).

Is there something I’m missing, or is the argument just that it’s not really that much extra effort, might have a benefit, and probably won’t cause harm?

3 Likes

I have exactly the same reservations, and hope that others can provide relevant references. I know there is a paper studying targeted maximum likelihood estimation, a related method, which found that the standard errors provided by that method have a major bias (too low) and need correction by an impossibly computationally expensive bootstrap process. There is another paper out there comparing one multi-step method with ordinary covariate adjustment, finding no advantage. I think the multi-step method was a “doubly robust” method.

What has not been well understood by the doubly robust community is that ordinary covariate adjustment is pretty darn robust to model misspecification. And it tries to do the right thing in explicitly handling outcome heterogeneity within exposure groups.

I’ve seen several in the ML community linking the new book CausalML Book (causalml-book.org) from Chernozhukov et al. containing several doubly robust methods. It is from the point of view of the econometric community and the ML label is frustratingly useless in my view. The authors are solid, although I must make the disclaimer that this is not my field and I’ve only skimmed through the book.

Scanning through the book, it’s apparent that the main novelty is in adding the ML label to the title, because the methods and approaches have been around for quite some time (e.g. SEM, penalization via LASSO etc.).

On the other hand, there’s a funny paragraph on p. 256 regarding overfitting and certain practices which, together with other remarks scattered throughout the book (e.g. the sidenote on p.298), cast a welcome critical view of certain ML practices.

I don’t know if this is common in econometrics, but I was surprised at the staggering amount of interactions in some of the modelling approaches (and yet no mention of non-linearities in these same models, which in my mind ought to be prioritized over interactions). Here’s an example in their notebooks.

I have not seen a lot of focus on some of the dominant challenges like selection of variable (which seems to be addressed mostly in the context of LASSO/Ridge/ElasticNet), and evaluation of non-linearities is very narrow going from simple transformations straight into models like RF or ANN. Related to this, I have not seen sample size considerations being addressed. In light of the excellent paper by Riley et al. on model instability (Stability of clinical prediction models developed using statistical or machine learning methods - Riley - 2023 - Biometrical Journal - Wiley Online Library), I would say that’s *one of* the elephants in the room in these applications.

I’d love to hear from others on this topic.

1 Like

@timdisher

I agree with you that the usefulness of doubly robust estimators is very questionable if you are just fitting basic parametric models for both the outcome regression and the propensity score model.

Based on my (admittedly limited) reading into the doubly robust literature, the actual utility of doubly robust estimators arises when you want to use machine learning algorithms as part of your estimation procedure. The reason for this is that, because of the way they are constructed, a doubly robust estimator that uses ML algorithms for the outcome regression/propensity score can have standard asymptotic behavior even if the individual ML algorithms used to construct the doubly robust estimator do not. See this paper https://arxiv.org/pdf/2004.10337 and in particular this paragraph from the discussion (emphasis mine):

The **need** for doubly-robust estimators with cross-fitting when using data-adaptive machine learning for nuisance function estimation arises from two terms in the Von Mises expansion of the estimator […] The second term is the second-order remainder, and it converges to zero as the sample size increases. **For valid inference, it is desirable for this remainder term to converge as a function of n−1/2, referred to as as root-n convergence**. Convergence rates are not a computational issue, but rather a feature of the estimator itself. Unfortunately, data-adaptive algorithms often have slower convergence rates as a result of their flexibility. However, **because the second-order remainder term of doubly-robust estimators is the product of the approximation errors of the treatment and outcome nuisance models, doubly-robust estimators only require that the product of the convergence rates for nuisance models be n−1/2**. To summarize, cross-fitting permits the use of highly complex nuisance models, while doubly-robust estimators permit the use of slowly converging nuisance models. Used together, these approaches allow one to use a wide class of data-adaptive machine learning methods to estimate causal effects.

A source of confusion is that many papers on doubly robust estimators start from the premise that double robustness is some super desirable property in and of itself. The reasoning then seems to go that *if* we want to achieve double robustness, *then* we should use flexible ML algorithms to ensure correct specification of both the outcome regression/propensity score. From the paper above, we can see that this line of reasoning should almost be reversed. That is, *if* we are committed to using ML algorithms in our estimation procedure for either the outcome regression or the propensity score, *then* we must use something like a doubly robust estimator in order to get standard asymptotic behavior (and hence all the standard ways of constructing p-values and confidence intervals). Seen from this perspective, double robustness is not an **end goal** to be pursued, but rather a **necessary condition** for using ML/“data adaptive” procedures.

But beware of underestimated standard errors.

1 Like