I have a dataset to analyse and I am not sure how to deal with some of the complications in the data.
The data comprise about 4000 observations of patients (and nurses), with indicators of whether certain events E1-E15 occurred, and a number of other variables.
Powers that Be want some estimate of the number of events per observation overall, and in various subgroups, as well as some analysis of what factors (eg patient age, nurse experience) might be associated with events. No specific hypotheses; this is descriptive/exploratory.
Complications
1. Observations have different types.
For type 1, only events E1-E9, E15 are possible, the others are not applicable.
For type 2, only events E1-E5, E7-E15 are possible, E6 is not applicable.
So type 1 observations have a maximum of 10 events, while type 2 have up to 14 (I think actual maximum in the data is 4).
When possible, events E10-E12 occur much more frequently than other event types.
2. Relative frequencies of type 1 and type 2 probably vary between subgroups of interest
For example, type 2 is extremely rare in young patients, but is much more common with older patients.
Questions
Is an overall summary of events/observation meaningful?
Can I make valid comparisons of events/observation between subgroups where relative frequencies of
type 1 and type 2 observations vary? How? Some sort of stratified analysis?
Additional complications
- observations are not really independent: the ~4000 observations involve about 1300 patients and 300 nurses
- event types may not be independent. I strongly suspect certain events are more likely to occur together
Any advice welcome! So far I have been analysing as binary (1 or more events vs no events), but people are asking for total events.
Tim