Use of reponse denominator as covariate in regression

Dear All,

I’m trying to estimate the effect of an intervention on surgical (say emergency) procedures recorded at hospital level from observational data via the Generalised Synthetic Control (GSynth) causal modelling framework. The outcome (regression response) variable I’m modelling are hospital-level monthly emergency procedure rates, defined as the ratio between: [NUM] the number of emergency procedures taking place in a hospital over a given month; and, [DEN] the size of the population forming the catchment area of that hospital in that month. Predictors in the GSynth regression model include hospital- and area-level time-varying descriptors.

My issue revolves about the use of the DEN variable in the GSynth model, whereby available options for its inclusion in the model are:

(i) as the denominator in the definition of the response variable being modelled;
(ii) as a predictor to the response above defined;
(iii) as a weight to estimate the weighted average Treatment Effect on the Treated (ATT).

I have the following opinions:

  1. (i) is appropriate;
  2. (ii) should not be used in conjunction with (i), as it would introduce endogeneity and it would tantamount on its double utilisation as a predictor and an offset term;
  3. (iii) is appropriate since hospitals of different size will generate emergency surgical rates with different variability;
  4. the simultaneous use of (i), (ii) and (iii) is overkill and unwarranted.

While I’m fairly positive about 1) and 3) I’m not entirely sure about 2) and, as a consequence, 4). As such I’d be grateful if you could provide me with your opinions, or perhaps some reference (which I was surprised not to find through my Google searches), around the above modelling quandary.

With many thanks in advance for your help!

1 Like