I am working on a secondary data analysis which uses individual patient data from two randomized controlled trials. For our study, randomization was broken by only including patients with at least 1year followup.
Both studies used the same longitudinal outcome (same time points), same treatments, and had the same set of clinically relevant covariates.
The goal is to estimate the average treatment effect using the combined data. I don’t have specific training in this area, but I think my current approach can be considered a onestage IPD metaanalysis.
What would be the most appropriate model choice for this scenario? I’m specifically wondering if a two level (2 RCT study groups) randomintercept term is ok to use in this situation. Some options I’ve considered:
 Mixed model

outcome ~ outcome_baseline + treatment + time + covariates + (1 + time  study/id)
 Random intercepts for subject ID nested within study group.
 Only two study groups means the variance estimate between study groups is not informative, but if we ignore the variance estimate, does this still provide a relevant estimate of treatment effect?

outcome ~ outcome_baseline + treatment + time + covariates + study + (1 + time  id)
 Even though we are not interested in its coefficient, treat study group as a fixed effect.
 GLS model
formula = outcome ~ outcome_baseline + treatment + time + covariates + study
correlation = corCAR1(form = ~ as.numeric(time)  id)

weights = varIdent(form = ~ 1  study)
Thanks!
I highly recommend the book on IPD metaanalysis by Richard Riley et al.
1 Like
And not that the minimum number of studies for which random effects can be used is 4.
True. I forgot the reason though. Why exactly?
A key parameter is the variance of the random effects. It really takes 70 clusters to nail down the variance. The random effects are not identifiable with 1 cluster, not identifiable with 2 clusters unless the variance is known, and things are unstable with 3+ clusters. I hope someone can find a real reference about this.
An informative prior on the random effect SD could help with this issue.
Yes, if you don’t mind that prior being very influential. I don’t know the theory but I would tend to put study as fixed effects.
To Frank’s question, there was a good discussion about this general subject on CrossValidated (CV):
What is the minimum recommended number of groups for a random effects factor?
One of the references listed there is a question in Ben Bolker’s GLMM FAQ here:
Should I treat factor xxx as fixed or random?
Ben is the current maintainer of the R lme4 package.
Quoting from Ben’s FAQ:
One point of particular relevance to ‘modern’ mixed model estimation (rather than ‘classical’ methodofmoments estimation) is that, for practical purposes, there must be a reasonable number of randomeffects levels (e.g. blocks) – more than 5 or 6 at a minimum. This is not surprising if you consider that random effects estimation is trying to estimate an amongblock variance.
The other discussion on the CV link above, including a reference to Gelman and Hill (2007):
Data Analysis Using Regression and Multilevel/Hierarchical Models
would suggest that to have fewer than 5 or 6 levels for the random effects may not be any worse than if one added the factor as a fixed effect in a classical model setting, but it might be worth running both to effectively conduct a sensitivity test, presuming that the mixed effects model, in the case being discussed here, converges with only two study levels.
1 Like
So the model which uses studylevel random intercepts isn’t an option in this case.
I skimmed the IPD metaanalysis book. I’ll need to learn more about the twostage method they discuss. For the onestage method, they make the distinction between common, stratified, and random effects. A common effect is a fixedeffect that can be assumed in common across studies and a stratified effect is a fixedeffect that can be assumed distinct across studies (so an interaction with study term).
They seem to advocate for a common effect for the treatment effect and stratified effects for covariates other than the treatment effect. I don’t understand why they also discuss using a random effect for treatment, which a treated/control variable would seemingly lead us to a twolevel random intercept that Frank ruled out. Or they are using the treatment variable as numeric with 0/1 or 0.5/0.5 coding and using it as a random slope? Anyone familiar with these models know how to implement this using an R mixedmodel formula?
When they say randomeffect for treatment, they use a random slope for treatment and random intercept for study.
I’m feeling more inclined to go the GLS route structured as
formula = outcome ~ treatment * time + study * (outcome_baseline + covariates)
correlation = corCAR1(form = ~ as.numeric(time)  id)
weights = varIdent(form = ~ 1  study)
And compare that to the results from the twostage method.