Multiple imputation with chained equations and multivariable regression with propensity score

Hopefully this isn’t a woefully ignorant question.

I have been tasked with estimating whether an exposure X is associated with an outcome Y. The number of outcomes will likely allow me to estimate the intercept, along with 3-4 covariates. There are several known strong confounders (of which age is the most important) and a multitude of reasonably likely confounders. I am leaning towards using a propensity score as my data reduction method.

However - there are also missing data of likely moderately important to frankly important confounders (height, weight and cigarette packyears). The data missingness lends itself well to multiple imputation with chained equations (MICE), i.e. it fulfils the missing completely at random criteria, at least on the surface.

My question is the following: as I already intend to incorporate the variables with missing data into the propensity score, should I first estimate the propensity score for individuals without missing data, and then use the MICE procedure from Hmisc’s aregimpute() function to impute the propensity score for those who had missing data, or should I impute the missing data first, and then estimate the propensity score for all individuals? If the later, then how would I incorporate the variance inflation into the final estimate?


I might have a woefully ignorant guess.
As a non-expert, the first thing that comes to mind would be to do both and see how the overall results differ. On the other hand a hunch says that when you fit the propensity model first, any bias caused by missing data will be incorporated into the propensity score and its imputation would carry forward the same bias, whereas imputing the missing variables first and then building the propensity score separately for each imputed data-set might prevent that.

Hopefully this gets some expert’s attention and we will both be enlightened.

Take a look at


Do not do both unless there is a clear plan for what to do if results differ.

1 Like

I wrote up a blog post on this: