# Pre-post study design with data not missing at random and multiple observations per patient

I am conducting an observational study with 150K subjects to compare the hospital 30 day readmission rate of two groups within 6 months before and after an intervention. To have a readmission the patient has to be admitted in the hospital in the first place which doesn’t happen for most of the sample. A sample of my data is:

1 Pre 1
1 Pre 0
1 Pre 0
1 Post 0
2 Post 0
3 Pre 1
4 Post 1

From just the sample above if there were 4 admissions in the pre period and 3 admissions in the post period. In the pre period 2 out of 4 readmitted and in the post period 1 out of 3 readmitted. Another way to present my data is by subject:

1 Pre 3 1 .33
1 Post 1 0 0
2 Pre 0 0 NA
2 Post 1 0 0
3 Pre 1 1 1
3 Post 0 0 NA
4 Pre 0 0 NA
4 Post 1 1 1

I need to summarize my data to find the readmission rate and then see if the difference is significant. The problems I have are:

• The observations on each subject are not independent
• There are a lot of patients with either no observations in the pre or post time period because they never went to the hospital
• Do I compare the overall readmission rate (2/4 compared to 1/3) or the average readmission rate per patient (66% compared to 33% ignoring the NAs because of the 0 observations)?
• What statistical test would be recommended to test the difference in proportions?