Unequal Intervals When Using Pooled Logistic Regression for Longitudinal Data


I’m using a pooled logistic regression model to estimate the association between a time-varying exposure and a binary outcome. Some of the covariates are also time-varying. Each indvidual will thus have 7 measurements, due to there being 7 visits (baseline + 6 follow-up). The study, by design, had unequal interval lengths (1 month for each of the first three intervals and then 3 months for the next 3 intervals and finally 1 month for the last interval. I read in Allison’s Survival Analysis Using SAS text (2nd Edition) that:

"[if you are estimating a model that] place no restrictions on the effect of time, and if the data are structured so that every individual’s interval at time t is the same length as every other individual’s interval at time t, then the separate parameters that are estimated for every time interval automatically adjust for differences in interval length. This situation is not as common as you might think, however. Even when intervals at the same calendar time have the same length, intervals at the same event time will have different lengths whenever individuals have different origin points. (pg 249)

I was wondering what Dr. Allison means exactly by the bolded part. Is this referring to situations where individuals are allowed to enter the study late/after it has already started? So if there are no individuals entering late then there is no need to adjust for interval length? What about when there are individuals lost to follow up?

When there are unequal intervals, Allison says this can be adjusted for by doing the following:

an ad-hoc solution will usually suffice: simply include the length of the interval as a covariate in the model. If there are only two distinct interval lengths, a single dummy variable will work. If there are a small number of distinct lengths, construct a set of dummy variables. If there are many different lengths, you will probably need to treat length as a continuous variable but include a squared term in the model to adjust for nonlinearity.

Is this interval length covariate in addition to the interval indicator we usually include in a pooled logistic regression analysis? Or is it included in replacement of the indicator variable?

Additionally, this is more of a terminology question, but is a pooled logistic regression simply an alternative name/more specific type of discrete time survival analysis models?

Sounds like an application for a cox proportional hazards model with time-dependent covariates.

Pooled indicates that 0/1’s are added up within a group prior to modeling.