Best Causal Inference method for repeated Cross-section Design

Please allow me to introduce myself to you. I Dr Pavan, from India, currently working for an iNGO and as an advisor to the Government of Madhya Pradesh.
I am approaching you with a technical problem and seeking your expertise.
I am evaluating the effects of a government-implemented healthcare quality improvement programme between 2016 and 2020. This programme was implemented in six selected districts out of the total of 18 districts in Chhattisgarh state, India (the selection of districts was non-random).
Below is an outline of how the intervention and data look:

                                                        **Treatment Introduced**

Year of birth Intervention Districts Non-Intervention Districts
2011 No Never
2012 No Never
2013 No Never
2014 No Never
2015 No Never
2016 Yes Never
2017 Yes Never
2018 Yes Never
2019 Yes Never
2020 Yes Never
2021 Yes Never

The outcome variable is binary: ‘quality care provided; yes == 1 and no == 0’.

The treatment variable is binary: ‘exposed to treatment; yes == 1 and no == 0’.

I wish to estimate the impact of the intervention on the outcome. Is it possible to calculate the average treatment effect (ATE) and average treatment effect among the treated (ATET) under this scenario?
I plan to employ a repeated cross-sectional design to assess the impact of the programme in the six intervention districts.
In your opinion, among all the methods available for estimating the impact of an intervention from observational data, which is the best technique or strategy among available (e.g., DiD, RA, Propensity Matching etc.,) that I can use for the above-mentioned scenario?

Can I use the difference-in-difference methodology (is it applicable to binary outcomes)?
I use Stata software for data analysis. Based on what I read in the Stata manual of treatment effect estimation, I am currently using regression adjustment.

teffects ra (quality education wealth, logit) (treatment), atet // Average Treatment Effect on Treated

teffects ra (quality education wealth, logit) (treatment) // Average Treatment Effect

Is this approach valid?

My question is: which two approaches (best and second best) will provide the most accurate answer about the impact of the programme for this scenario?

1 Like

From the setting you describe it sounds like you have a pre-post intervention setting with temporal controls. Therefore, I think the better analytical approach would be difference in differences or synthetic controls methods (or a general method combining the two approaches), especially if the estimand of interest is ATE on the treated.

I’m not familiar with stata, but for the the synthetic difference in differences there’s an accompanying R package you can check out: Synthetic Difference-in-Difference Estimation • synthdid