Inverse probability weighting for treatment selection and loss to follow up in observational studies



Observational data can have several sources of bias, for example self selection into the study, treatment selection, or loss to follow up.

Can anybody point me to papers that discuss how one can adjust simultaneously for multiple sources of bias, in particular treatment selection and loss to follow up?

One “naive” idea would be to calculate weights based on multiplied probabilities from 2 selection models. For example, if one had one model for treatment selection and one for loss to follow up, one could be tempted to calculate weights from the multiplied estimated probabilities for treatment and loss to follow up.
However, this approach rests on the assumption that treatment selection and loss to follow up are independent. This seems like an important assumption that could often be false. For example, in the mental health area, pre-treatment symptom strength is associated with higher probabilities probability of both treatment and loss to follow up.

I am especially interested in pointers to structural models (DAGs) of such problems,
but I’ll appreciate any comment or pointer!

Thanks, Guido

PS: I found this discussion of multiplying probabilities from selection models: But this does not discusss the assumptions of the approach, or provides further references.


Hi, Guido. What do you think about this: (1) Adjust for treatment selection as you would typically (e.g., IPW); and (2) Modify your outcome of interest to use person-time as an adjustment for length of follow-up?

Miguel Hernán discusses censoring from competing events (e.g., death, loss to follow-up, etc.) in chapter 17 of part II of his book on Causal Inference. There doesn’t seem to be a perfect solution, but he presents 5 options (Fine Point 17.1, page 72) – each of which has pros/cons/tradeoffs.

I don’t think loss to follow-up can be depicted in a DAG since it’s not part of the causal model. You would use the DAG to identify how to implement the IPW for treatment selection, but you would “feature-engineer” the new person-time outcome on a per-observation basis (row-wise in the data), not depicted in the DAG.

– Alexander.


I particularly like @Sander Greenland’s 2005 ‘Multiple-Bias Modeling’ discussion paper, mentioned here.


All causally mediated bias sources including loss-related ones can be shown in a DAG. For example you can label the indicator C (for censoring) and put it in the DAG as a node which is conditioned on (often shown by circling or boxing it). Even if C is assumed to affect nothing, the arrows into it are paths for collider bias (e.g., arrows from treatment X and an unmeasured confounder set U). See Ch. 12 of Modern Epidemiology (3rd ed 2008) for illustrations.
[An example of a bias that is not necessarily mediated causally is sparse-data bias, which is hard to depict even with equations and so is usually overlooked but more common than realized.]


Thanks David for citing the 2005 paper. Multiple adjustments are also discussed in Ch. 19 of Modern Epidemiology (3rd ed 2008) and much more abstractly in a penalized-likelihood framework in Greenland S (2009), Relaxation penalties and priors for plausible modeling of nonidentified bias sources. Statistical Science, 24, 195-210 (doi: 10.1214/09-STS291).
Key points are that

  1. multiplying through will not (apart from some very special cases) adjust properly when measurement error or misclassification are among the bias sources being addressed (adjustment is not multiplicative when the bias is not), and relatedly that
  2. the order of adjustments needs to be reverse of the order of events (proper adjustment is noncommutative, again apart from some special cases).


Fascinating concept, sparse-data bias [1]! (The paper is even open-access.) I’ll have to read properly later, but I wonder (having read 1st page, and failed a text search for “boot”) whether bootstrapping standard errors could help detect this problem by inducing the “meaningless artefacts” you mention in the piece.

  1. Greenland S, Mansournia MA, Altman DG. Sparse data bias: a problem hiding in plain sight. BMJ . April 2016:i1981. doi:10.1136/bmj.i1981


Good idea but the devil is in the details. Especially it depends on the bootstrapping: as with most bootstrap apps, the common naive practice of taking the data-resampling distribution as an estimator of the sampling distribution (“nonparametric bootstrap”) would not do well - consider that it would leave every random zero as a structural zero, thus exaggerating the problem to an unlimited degree. Instead simulation from the assumed model (“fully parametric bootstrap”) would focus in on this particular problem and any others intrinsic to the fitting methods being studied (e.g., ordinary MLE, 2nd-order (Firth) corrected MLE, etc.), as opposed to problems of structural bias or misspecification. Even then some adjustment to the resampling outputs could be called for. I’ll bet there are more tech details that would arise in operationalizing the idea properly; e.g., see Greenland and Mansournia (2015). Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. Statistics in Medicine, 34, 3133–3143.


Thanks for this insightful comment. Would you say that loss to follow-up is causally mediated in most cases?

I do agree it can be considered to be a collider in some circumstances, but in that case, conditioning on it may actually induce selection bias like in Berkson’s phenomenon.

I will most certainly study the references that you cited!


  1. I can’t imagine a real application in which loss to follow up is not causally mediated. Can you?

  2. In practice loss is always conditioned on, as are all other selection and missing-data variables. Hence nodes for loss and other variables indicating selection for analysis need to be in the graph whenever they may plausibly be affected by more than one variable already in the graph. Again I can’t imagine a real application with loss or missing data where that in not the case! Which means that some collider bias should be expected when there is loss (just as some confounding should be expected outside of huge perfect randomized trials) and any graph without a selection node is not realistic. See Greenland S (2010). “Overthrowing the tyranny of null hypotheses hidden in causal diagrams.” Ch. 22 in: Dechter et al. Heuristics, Probabilities, and Causality: A Tribute to Judea Pearl. London: College Publications, p. 365-382.


Very sincere thanks for your explication in comment #2 above. That makes a lot of sense.

Regarding #1, whether loss to follow-up may not be causally mediated – I guess what I’m thinking is whether or not the “cause” is within the scope of the causal mechanism that you are considering. Of course, there are big, unmeasured factors that may relate to many facets of life, and of course it is possible to insert them into a graph… but one could do that for a thousand hypothetical factors, so an honest question is, where do you stop?

Let’s consider a practical example: If the task at hand is to study the effect of treatment A on outcome Y in pediatric patients, and the patient moves to another state when his father’s job gets transferred (thus resulting in loss to follow-up) is the loss to follow-up causally mediated in a way that is related to the investigation at hand? Certainly, moving is the cause of this patient’s loss to follow-up. But do you include this in the diagram as an unmeasured node “M” (moving out of state) and include it in the diagram?

Conceivably, if there are 100 patients with loss to follow-up, each may have a different reason (cause) for this occurring. Do you need 100 additional causes, or do you consolidate into a single “C” (censoring) node?

I am anxious to learn more, and I will read the papers and chapters that you cited.

Thank you for your feedback.

– Alexander.


The usual approach I use and see is to select the (relatively few) specific factors that have been raised by participants in the context along with plausible mechanisms for their actions. There are those that try and raise vague unspecified multiple factors and such groupings are usually represented by a single letter like “U”.

The censoring node C is an outcome variable so your question about that does not make sense as stated, since it is about the factors that affect it. It seems to me implausible that the 100 patients would have 100 clearly different reasons of importance, mostly it would come down to a few major categories like “job transfer” etc. and would be unknown so again could be included in U.

Please read one of my and/or Pearl’s review papers about causal diagrams, or Ch. 12 of ME3 on the topic, as I think you will find questions like these answered there. You can contact me directly for PDFs of papers.


A good question that perhaps highlights one thought that discourages researchers and statisticians from experimenting with causal diagrams. @Sander’s responses have been good as well.

Given that most researchers and statisticians do not use causal diagrams (as I perceive it), while many, including myself, strongly recommend their use, perhaps a topic category of “causal diagrams” would be worthwhile? It would hopefully encourage debate about why they are useful, how they can be used, factors discouraging use, and problems that others might be able to suggest solutions to.


Thanks to everyone for the very useful input!

Two clarifications:

  1. In the particular analysis I am working on, loss to follow up is due to study drop out (i.e. people stop returning questionnaires).
  2. From looking at the data, I know that one can predict quite well who drops out, and treatment selection and drop out depend overlapping sets of variables

From reading the comments, it appears to me that it makes sense to read Sander Greenlands paper from 2005 in the Journal of the Royal Statistical Society ( and go from there.

Thanks again to everyone!


I think you might find it easier to use inverse-probability-of-censoring (IPCW) weighting for what you describe than the Bayesian methods in my 2005 article. I think that and IPTW are described in the Hernan & Robins book “Causal Inference” available free online, and there are SAS procs available for those as well (and they happened to multiply!).



I am anyhow doing the analysis in custom Stan models, where I have so far already implemented an IPTW analysis.
Now I want to extend this analysis to also account for loss to follow up. I don’t mind looking into a Bayesian solution.

I’ll report back when I think I’ve understood what to do specifically.


If you have already done IPTW then you will find IPCW a simple extension to it.
I haven’t tried Stan (I program in Gauss, not R) but if it handles overparameterized (frequentist nonidentified) models then it should be able to do the type of analyses in my 2005 and 2009 papers.