Hi,
Presuming that @JorgeTeixeira takes the recommended approach of not using the baseline observation as a record in the per patient cluster, and only uses the baseline value as a covariate, there will be 2 post-baseline observations (records) per patient (cluster), as Frank references in his reply. That is, there will not be 3 records per patient, one per time point including baseline, but only two.
Thus, presuming a reasonable sample size in the study being referenced, there should not be an issue with the number of clusters (patients). The issue however, will be the typical cluster size of 2, presuming that most patients have the two post-baseline observations. You can get away with some number of single observation clusters, that is only one post-baseline observation for a patient at either time point, but generally, you donât want that to be a material number of patients, for some definition of âmaterialâ.
You may also have issues with any patients that only have the baseline observation and do not have either of the two post-baseline observations, which would leave you with no effective observations (records) for that patient given this approach. That may depend upon the ITT cohort inclusion criteria and whether you plan any imputation of missing data.
With the small cluster size of 2, the likelihood is that you can use random intercepts in the random effects specification, but you are likely to have model convergence issues if you also specify random slopes. You can try, but if you get convergence errors, remove the random slope specifications and retain the random intercepts.
When I engage in these analyses, I typically use Râs older lme() function rather than lme4(), as I donât need the more complex relationships (e.g. crossed effects) that lme4() supports. It is also easier to specify AR1 correlations using the older, and historically stable, lme().
Thus, my default lme() model specification typically looks like:
MOD <- lme(Response ~ Group + Time + Group:Time + T1 + Group:T1,
random = ~1 | ID,
data = DF.MOD, na.action = na.omit,
correlation = corAR1(form = ~ as.numeric(Time) | ID))
Where Response is the continuous response variable of interest, Group is the treatment arm factor variable, Time is the time based factor variable, T1 is the baseline value of the variable of interest, and ID is the unique patient identifier. DF.MOD is the source data frame for the observations, structured as required.
Note that I use an AR1 correlation specification, as I do not presume compound symmetry over time. With only two post-baseline observations, an AR1 specification may not make sense here. Note that the Time variable, which is a factor, needs to be coerced to numeric for the AR1 specification.
Once I have the model set up, I then use Russ Lenthâs âemmeansâ CRAN package to generate various contrasts for both within and between group estimation.
There are alternatives that can be used. Frank mentioned one, and either a GLS or GEE approach may also be apropos, given a desire to focus on marginal effects.