I am conducting an observational study with 150K subjects to compare the hospital 30 day readmission rate of two groups within 6 months before and after an intervention. To have a readmission the patient has to be admitted in the hospital in the first place which doesn’t happen for most of the sample. A sample of my data is:
ID | Timeperiod | Readmission Occurred |
---|---|---|
1 | Pre | 1 |
1 | Pre | 0 |
1 | Pre | 0 |
1 | Post | 0 |
2 | Post | 0 |
3 | Pre | 1 |
4 | Post | 1 |
From just the sample above if there were 4 admissions in the pre period and 3 admissions in the post period. In the pre period 2 out of 4 readmitted and in the post period 1 out of 3 readmitted. Another way to present my data is by subject:
ID | Timeperiod | Number of Admissions | Number of Readmissions | Readmission Rate |
---|---|---|---|---|
1 | Pre | 3 | 1 | .33 |
1 | Post | 1 | 0 | 0 |
2 | Pre | 0 | 0 | NA |
2 | Post | 1 | 0 | 0 |
3 | Pre | 1 | 1 | 1 |
3 | Post | 0 | 0 | NA |
4 | Pre | 0 | 0 | NA |
4 | Post | 1 | 1 | 1 |
I need to summarize my data to find the readmission rate and then see if the difference is significant. The problems I have are:
- The observations on each subject are not independent
- There are a lot of patients with either no observations in the pre or post time period because they never went to the hospital
- Do I compare the overall readmission rate (2/4 compared to 1/3) or the average readmission rate per patient (66% compared to 33% ignoring the NAs because of the 0 observations)?
- What statistical test would be recommended to test the difference in proportions?