Survival analysis for a disease vs non-disease cohort


Using a claims database we created a disease cohort to evaluate a safety outcome. To put it into a context, we randomly chose a non-disease cohort that was 1:1 matched with the disease cohort on age, sex, and cohort entry date (+/- 30 days), where cohort entry date was defined as the inpatient/outpatient visit date. For the disease cohort, the cohort entry date was the disease diagnosis date. We also generated a list of covariates that are risk factors for the safety outcomes. What do we need to consider when we Cox regression models to assess the hazard ratios (proc phreg) for the safety outcome for the matched disease vs non-disease cohort? Thank you very much.


Matching implies that the original dataset was too big to handle, and it can have a major problem of discarding valid matches as well as being dataset order-dependent and losing power. What was the purpose of matching vs. covariate adjustment?

Thank you for your reply, Frank. As you stated, the non-disease cohort in the claims data is too big to handle. We randomly selected a non-diseae cohort. As we only matched on three variables (age, sex and cohort entry), we want to adjust for covariates to control confounding factors. The disease cohort has 11000+ patients. Thanks again.