Missing Intervention

I am working with observational data where my goal is to evaluate the effect of intervention(a) on an outcome(y).

There are some subjects who received intervention (a=1), did not receive intervention(a=0) and missing data on intervention(a=missing). No missing data in outcomes(y), outcomes observed for all.

How do I handle observations with missing intervention status(a=missing), do I just exclude them an do a simple complete case analysis ? Should I do a what-if sensitivity analysis 1) Considering all with missing intervention as Intervention=Yes and then again 2) Considering all with missing intervention as Intervention=No ?

Any suggestions highly appreciated. Thanks in advance.



Was this a prospective or retrospective study?

If a primary goal of the the study design was to evaluate/compare these two specific interventions in an observational (non-randomized) setting, then the inclusion/exclusion criteria under either setting should have explicitly required that patients will have had one or the other to be included in the study. Thus, you should know, for every patient, which one of the two they had.

So one scenario is that your study entry criteria were mis-specified or perhaps not adhered to, and you have data on patients that should not have been included in the study to begin with. So, one option is to define a modified ITT cohort, where the patients included in your analyses have had one of the two treatments, and you know which one.

If you know that the patients definitely had one of the two interventions of interest, and for some reason, these data were not collected on this subset with missing data, then I would do what you can to go back to the data sources and obtain that information. Treatment information would have to be documented in the source medical records for these patients and somebody with access to those records can get that for you.

If this is a post-hoc analysis on a study where data were collected for other purposes, and you are going back to compare these two treatments retrospectively on available data, it is reasonable to exclude patients where you do not know which treatment they had, since they would be outside of your underlying hypothesis for these analyses and the sub-group of interest.

If there are other dynamics at play here, then another alternative is to consider the group where the intervention is unknown as a third group, rather than presuming that they had one of the two treatments that you are interested in. The issue there might be, if you truly have no idea what intervention they had, the interventions used in this group may not be one of the two that you are interested in, and could be rather heterogeneous, leading to exacerbating the confounding issues in the analysis.

We would need to have a better understanding of why the interventions are missing on these patients to provide other thoughts.


Thanks, this is an excellent suggestion. I will follow your advise.

1 Like