What are the problems, if any, of including individuals with only one data-point in studies of change over time in repeated-measure data?

According to multilevel(/mixed) modelling texts, it is fine to include clusters with only one observation. However, when applying these models to repeated-measure data to explore change over time, including individuals with one data-point seem counterintuitive: how can change be modelled with one data-point?

What are the problems, if any, of including such individuals when using mixed models for change over time? Should they be excluded? Is this different for GEE, which is said to be for “population” rather than individual estimates?

Indeed there is no problem in including subjects with one measurement in a longitudinal/multilevel study. These subjects still contribute in estimating the cross-sectional effects of covariates (e.g., time effects). The longitudinal effects though are estimated from subjects with more than one measurement.

1 Like

Thank you. That sounds like including them would not contribute much to estimates of change?

The general advice is to include them, because typically you have missing data, and a likelihood-based repeated measurements analysis will be valid under the missing at random assumption if they are in.

1 Like

Since the model is fit as a whole, improving estimates in one part of the model frequently translates to better estimates in other parts. So yes, it can contribute to estimates of change - how much depends on the actual data and model.

1 Like

Thank you both. How does this apply to GEE?
For my particular circumstance, I was advised that GEE provides robust standard errors required for me to use inverse probability weighted analyses of repeated data. (I’m not sure if this is equally possible in mixed models?)

Likewise for GEE subjects with one measurement also contribute in the cross-sectional effects which is of primary interest. Hence, my advice would be to include them.

Moreover, note that the protection the sandwich estimator provides typically comes at the expense of power. Thus, it is in general a good idea to make the effort to model the correlation structure appropriately/flexibly. Also, the sandwich estimator really “protects” you in specific settings (balanced designs, not too many repeated measurements and not too many (continuous) covariates). You could find more info in the book by Fitzmaurice et al. Applied Longitudinal Analysis, and in my course notes: drizopoulos.com -> Teaching -> CE08: Repeated Measurements.


Excellent! Thank you very much!