Imbalance between incident cases and controls

Hi there,
First, I want to thank this opportunity to ask some questions that may be trivial for many of you but difficult for others like me.

I have a nested case-control study and I want to estimate the effect of a polygenic risk score on the incidence of T2D using Cox proportional hazard models. However, only 258 out of 4521 became cases, that is, only 5.7% . Should I be worried about the imbalance between cases and controls? I was thinking of taking a subsample of cases but I am not sure. Please, I would like to hear your opinions and I will appreciate a lot any references you can recommend me

Thank you very much

The mystery here is who planted the idea that imbalance should be a worry? Imbalance is a fact of life, and throwing away good data, also making your estimates never apply to future samples, is not the answer. The only worry with imbalance is the reduced precision and power it brings about. But trying to do something about that will make things worse.

Thank you very much dear Dr. Harrell
Now, I feel completely confident doing the analysis

Are you sure your study design is case-control? In case-control studies, you get to decide how many controls to sample. It sounds strange to me that you sampled 4521 controls if this was not your intention.

It sounds to me like you may be talking about the full underlying cohort. If you already have data on everyone in the cohort, it is much preferable to analyse the full dataset. If exposure data has not already been collected and if it would be prohibitively expensive to collect it on everyone, you can consider taking a random sample from the controls and obtaining data on the exposure variable only in those who were selected; this would make it a case-control study.

Like Frank said, this imbalance is not a problem. More controls are always better, but with decreasing marginal benefit.


Dear Anders,
Sorry, I made a mistake, this is a case-cohort study.

Thank you very much for your comments

Thanks for clarifying. Case-control studies are usually imbalanced by design (2:1 or 3:1 matching for example) and nothing special should be done about imbalance. The conditional logistic model (fitted using a Cox model trick in the R survival package) works.