Help needed with generalized estimating equations

Hello, I need some help using Generalized estimating equations (GEE). I am new to using GEE but have done my best to read up on how to use it. Unfortunately it has not gotten me far.

We are measuring data using a checklist. We plan to score our data as 1 (completely reported), 0.5 (partially reported), or 0 (not reported). This makes our criterion variable fractional rather than count data if I understand correctly. Furthermore, since particular items do not apply to all trials, we plan to generate a percent completion score for each trial based on the number of applicable items reported to the total number of applicable items (i.e., the denominators will slightly vary in some trials). We are now comparing the percent completion to baseline characteristics to see if there is a correlation. My question is, should we specify a Gaussian distribution given these considerations? Also, should we specify the correlation as independent or exchangeable?

We are running our data analysis with Stata, and we are currently using the xtgee function. We are specifying it with “family(gaussian) link(identity) corr(independent)” but we are not certain if this is producing the correct results or not.

Thank you for any help

Welcome to datamethods Sam.

I’m not clear about the choice of GEE, i.e., is there clustering in the data. But whether you use GEE or full maximum likelihood methods, this seems to be a place for an ordinal outcome methods such as the proportional odds model, which handles arbitrarily many ties in the data.

There are some general problems when using ratios as dependent variables, and you might stop for a moment to see if it would be any better to use a count as the dependent variable adjusting (in a flexible smooth nonlinear way) for the number of criteria.

Thank you for your input. We do, in fact, have clustering in our data. If we do move forward with count data for the dependent variable, how would you suggest that we specify the distribution (e.g., Gaussian, Poisson, etc) given that we may have fractional data for our dependent variable. Do you foresee this being a problem?

Semiparametric methods such as the proportional odds model (a generalization of the Wilcoxon test but handles ties better) do not care at all about the overall distribution.