I’m posting to ask for references regarding conducting a causal inference analysis with missing outcome data. The context is an analysis to estimate the effect of lipid lowering treatment on LDL-cholesterol (as % reduction from baseline) using routine health care data from diagnosis of familial hypercholesterolaemia to 2 years follow-up. We have longitudinal data; however, as it is routine data, not all patients have LDL-cholesterol measurements at baseline (i.e. the diagnosis date) and at follow-up time (2 years after diagnosis date).
We’re thinking about doing multiple imputation to predict the missing LDL-cholesterol given the observed LDL-cholesterol measurements, other lipid tests, characteristics, past and future CV events. To estimate the effect of treatment on LDL-cholesterol, we’re thinking about propensity score matching; the propensity score estimation would include the characteristics at diagnosis which we think determine the LDL-cholesterol reduction and the decision to treat. This means that the PS estimation would include the multiply-imputed LDL-cholesterol at baseline.
I’d be grateful for some advice and/or references on:
Whether it is appropriate and how to conduct the PSM given the missing data. We’re not completely sure that this is (a) a good idea; and (b) if there are specific issues to be aware of. Do you know of any good papers that discuss this?
Whether PSM is the appropriate technique given that we expect that many of the characteristics that affect the % LDL-cholesterol reduction are only relevant for the treated group as they are treatment modifiers (i.e. interaction effects) rather than prognostic (i.e. main effects). Again, do you know of any good papers about this?
Thank you very much in advance!