I am a clinical epidemiologist and this is my first post here.

We have a clinical dataset that is arriving from an epidemiological observational study, where the 3 examinations have been performed on (different) representative samples at 3 different time points. The aim of the study is to see if there is an influence of the exposure on the outcome over the period of time. I found a similar (unpublished) study that had similar study design. They had used multivariable logistic regression and Oaxaca decomposition analyses. I have gone through the literature for Oaxaca decomposition and as a non-mathematician, it was difficult for me to understand it. Are there any other tools that I could explore for our repeated cross-sectional data.

P.S.: I use Stata for data analyses, but I could also use R if needed.

1 Like

we’d need to know a lot more eg what is being measured etc

incidentally, id refer to “multivariavle logistic regression” as logistic regression since it is always multivariable. No one ever does this in the literature, but that is because everyone is mimicing one another …

We have measured periodontal parameter on each tooth (PD: ranging from 1-12). Mean value of PD from all teeth will be our main outcome. We might also consider to categorize the participants based on the mean values into a categorical variable (mild/moderate/severe). These (continuous and ordinal) variables need to be regressed over exposure (binomial) variable. Furthermore, we need to adjust for other potential confounders. Hope this explanation helps.

i assume ‘exposure’ is not randomised. Any missing data? if only one PD is missing then the mean is missing? Personally, i think there’s too much collapsing of data into summary stats, categories etc. I might just do it as multivariate repeated measures and retain the original data, but that’s for the subject matter expert to decide. I still dont feel like i know enough about what is being analysed and context etc

The exposure is not randomised. The participants chose their own method. And it is normal to have missing teeth, or unable to measure PD at some surfaces due to some other clinical reasons. Therefore, we take row means for all available teeth.

I have been thinking of multi-level modelling too, but in that case, would you just have time as random slope, and no patient ID in the model?

A hierarchical (multilevel) model seems a good option (mixed in Stata). In that case, each participant is a cluster of observations, with the PD for each tooth as the lowest level variable (tooth nested within participant). You would have a random intercept at the participant level. This will account for the correlation among the PD values within a participant and to “adjust” for participant-level unmeasured factors (oral hygiene?). I believe that avoids the issue of different number of teeth per participant. You could also analyze PD as a dichotomous variable. Whether or not you need a random effects (coefficients) is a subject matter issue.