I am working on the link between gestational diabetes and environmental exposure. I have a database with 5990 deliveries including 518 gestational diabetes. I would like to know when and why it is necessary to match and use a conditional logistic regression. I was asked to match on age and to do a conditional logistic regression but I got feedback that it was not relevant. I therefore ask myself the question above.
Thank you in advance.
Matching is done when either
- the sample size is too small to model covariates, or
- the variable you are matching on is too difficult to model as a covariate (e.g., family, as in twin studies; occupation with 10,000 subjects may be from 400 different occupations)
Even in these situations, random effects without matching may be a better solution.
Don’t match on such an easy-to-model variable as age, which can easily be adjusted for using a restricted cubic spline function, fractional polynomial, or quadratic effect.
Generally speaking, there is no principled method for analyzing artificially matched data, and most matching methods exclude observations which does not lead to reproducible research.
2 Likes
Thank you very much for your answer.
The two potentially relevant reasons for using matching come together, right? They are only valid for logistic regression where too many covariates can “prevent” the model from converging, is not it?