I am currently working on a class project that involves a longitudinal cohort where all the participants received a drug thought to be protective against an infectious disease. The covariates and disease status of the participants are recorded at a baseline visit and then there are 7 follow-up visits where patients covariates are updated (aka time-varying covariates), infection status is assessed, and they receive the treatment drug again (except during the last/exit visit). Thus this creates 7 time intervals from the 8 visits. The first few intervals are only a month long, however the subsequent ones are around 3 months each. For the purposes of this project, I estimate ~ 66 or so of the 2000 cohort participants will have an infection over the study period (3.3%) based on a past study indicating 3.3 cases per 100. From here I will conduct a case control study with all 66 cases and a random sample of controls from the cohort, with 3 controls for every 1 case (so 198 controls). Thus with 7 observations per individual, the total sample will be 462 case observations and 1386 control observations.
The main exposure is amount of the drug found in hair samples. Since the laboratory analysis of the hair samples is relatively costly, only a fraction of the non-infected hair samples (controls) will be analyzed in order to save money and use the other hair samples for other studies. Thus the case control design.
I am required use a pooled logistic regression for this analysis, which is something I do not have a great deal of experience with. I have numerous questions:
Am I correct in believing that since the outcome in the cohort is rare (3.3%) then the pooled logistic regression odds ratio should approximate the hazards ratio from a Cox proportional hazards model? I read on another site that it would actually estimate the RR instead, is that true?
What is the key difference between pooled and conditional logistic regression? In Ngwa et al. (see below) it is mentioned that “the length of time interval tends towards zero” the two models are equivalent.
Pooled logistic regression should include an interval/time indicator variable as a covariate, correct? I was told this was the case however when I read the Ngwa article it indicates that they considered pooled logistic regression models “that adjust and do not adjust for time”. What is meant by this?
Ngwa, J.S., Cabral, H.J., Cheng, D.M. et al. A comparison of time dependent Cox regression, pooled logistic regression and cross sectional pooling with simulations and an application to the Framingham Heart Study. BMC Med Res Methodol 16, 148 (2016). https://doi.org/10.1186/s12874-016-0248-6