@f2harrell could you clarify what was not a good idea? Unless I’m mistaken, your comment was referring to a model which includes the baseline measurement as an outcome. The text shared by @FurlanLeo showed the baseline adjusted model for 2 post-treatment outcome measurements, which appears to be the method you prefer.
Yes I always use baseline as baseline. So @FurlanLeo had 2 follow-ups making correlation assumptions less of of assume, but still continuous time should be used if times vary much.
Regarding “I am finding that the baseline needs to be nonlinearly modeled (using e.g. a restricted cubic spline) for these types of outcomes because some patients can start off at extremely bad levels and get major benefits from treatments.”
Do you interpret the rcs(baseline) variable here, or no interpretation needed and it’s simply being adjusted for in the model nonlinearly because it’s better than the linear form?
You can choose either to interpret it or not interpret it. You can estimate anything you want from this model, including the expected change from baseline over a grid of possible baseline values.
RE: “You can choose either to interpret it or not interpret it. You can estimate anything you want from this model, including the expected change from baseline over a grid of possible baseline values.”
Below are some findings comparing rcs(baseline) vs. the raw baseline values using an ANCOVA model for our seizure dataset:
- The raw baseline values appear to be a much stronger predictor than rcs(baseline)
- raw baseline values: p<.00001, with a fairly small estimate
- rcs(baseline): none of the segments are statistically significant, though the estimates for each segment are about 100 times larger than that of the linear form
- visually inspecting the baseline values plotted against the outcome (follow-up values) adjusting for treatment group, it appears okay to use the raw baseline values
- The choice of either using the linear or nonlinear form of baseline didn’t seem to affect the estimates of the treatment groups compared to placebo.
- similar estimates for the treatment-to-placebo comparisons and p-values for the 2 models
- similar adjusted r squared for the 2 models
- Residual plots for both appear to be similar. However, both models showed some issues for the two tails in the residual plots. ANCOVA likely is not the best approach if one can explore ordinal longitudinal approaches.
In this case, would you just use the linear form of baseline if one has to use the ANCOVA model? Is the decision based on p-values and/or what makes sense visually? Thank you.
Fundamental problem: you can’t interpret individual spline terms. Instead compute AIC for the linear model and AIC for the spline model, or just look at chunk \chi^2 that combines all terms for baseline.
Hello everyone,
I’m currently working on a longitudinal dataset (2 parallel groups RCT; baseline plus 2 follow-up assessments, at weeks 0, 4, and 16, respectively). Sample size is 70.
I fitted a Generalized Least Squares model using rms::Gls
and obtained the following results:
m <- Gls(Response ~ Group*Time + Response_0 + Age, B = 500, data = data, correlation = corCompSymm(form = ~ Time | ID))
Tests of association
Group: \chi^2 = 6.06, df = 2, P = 0.048
Group \text x Time: \chi^2 = 2.68, df = 1, P = 0.10
Contrasts
Week 4: -4.6 [95% CI(-8.8 to -0.5)], P = 0.018
Week 16: -0.72 [95% CI(-4.7 to 3.9)], P = 0.76
Note
Time was modeled continuously.
Questions
The test of association for Group was barely significant, and the test for the Group \text x Time interaction was insignificant. The group contrast was significant at week 4 and insignificant at week 16.
If there is a main effect of Group but no Group \text x Time interaction, doesn´t this mean that the treatment effect is homogeneous across time? But this is not what the contrasts are showing.
- I suppose this is related to the fact that we should not give too much emphasis to the P-values (from
anova
andcontrasts
), but rather focus on the CIs, which btw are too wide to exclude an almost irrelevant effect at week 4? - Would the overall conclusion here be that the trial was largely underpowered and therefore inconclusive?
Thanks!
You pre-specified a reasonable model and should not use p-values to drive any changes in the model. The contrasts take the uncertainties about interactions into account, i.e., confidence intervals are properly wider since you didn’t know beforehand that interaction was absent.
Many thanks for the reply, @f2harrell!
But if there’s no interaction and there seems to be a treatment effect at week 4, shouldn’t the two CIs from the contrasts be similar? Indicating a homogeneous effect at weeks 4 and 16.
My understanding is that if there’s no interaction, then the estimated treatment effect should be (roughly) the same across time. But this understanding is incorrect apparently…
Without looking in detail I think you’re confusing “no impressive evidence of interaction” with “no interaction”.
Ok, I think I got it…
Let’s supose we have a different scenario:
Model
m <- Gls(Response ~ Group*Time + Response_0 + Age, B = 500, data = data, correlation = corCompSymm(form = ~ Time | ID))
Tests of association
Group: \chi^2 = 17.4, df = 2, P = 0.0002
Group \text x Time: \chi^2 = 13.8, df = 1, P = 0.0002
Contrasts
Week 4: -7 [95% CI(-10.5 to -3)], P = 0.0003
Week 16: 2.3 [95% CI(-2.2 to 6.8)], P = 0.3
Questions
Now the anova
gives a clearer picture, providing evidence for both a Group effect and a Group \text x Time interaction.
Accordingly, the contrasts
show that the treatment effect is not homogeneous across time, being present at week 4 but waning at week 16.
My point is that in both scenarios (this and the one from the post above), the contrasts
are sort of telling the same story, i.e., that the Group effect varies with Time. However, the anova
from one scenario is sort of different from the anova
from the other scenario.
At the end of the day, should I give more importance to the contrasts
rather than to the anova
when interpreting the results?
Thank you!
They are both valuable, although a Bayesian model would be of more direct value for the inference part.
I would emphasize the time x group interaction test, the overall group test (beautiful test of group difference at any time), and the two compatibility intervals, plus some minor emphasis on point estimates.
Perfect. Many thanks.
I suppose the order would be the following:
- Multiple df test of Group effect;
- Test of Interaction (Group x Time) effect;
- Compatibility intervals from the contrasts;
- Point estimates from the contrasts.
@f2harrell , I tried using two time variables I described, so that the slope is allowed to change as subject enters a different follow-up phase. However the slope change is abrupt and not smooth. I’m wondering whether it’s possible to encode this using e.g. your gTrans function to make the fitted curve smooth?
That should work, whether smooth or abrupt.
The statistical analysis of the article below is intriguing to me, and I would greatly appreciate some guidance on this.
In brief:
- N=160;
- 12 weeks intervention;
- 2 Groups (Tai Chi vs Control);
- Assessments at T0(baseline), T1(week 1), T2(week 8), T3(week 12), and T4(week 16);
- Analysis method: GEE (
Response ~ Group*Time
); - Time modeled as categorical, instead of as continuous (the latter being the preferred way, according to RMS - Chapter 7);
- Baseline assessment modeled as response, instead of as covariate (the latter being the preferred way, according to RMS - Chapter 7).
The authors assessed the intervention effect at each time point by looking at the \beta of the respective Group*Time
interaction. This is what is intriguing to me. Shouldn’t we assess treatment effects in longitudinal studies by calculating contrasts?
After doing a few simulations, I realized that when the baseline assessment is modeled as a response to treatment (as in the study above), the coefficients of the Group*Time
interaction terms are practically the same as the respective group contrasts. This is not the case, however, when the baseline assessment is modeled as a covariate. Also, this happens with both GEE and Generalized Least Squares.
Why is that?
Shouldn’t the authors of this study have calculated and reported the group contrasts?
Thank you!
Hi Leo
When time is modeled as categorical and T0 is the reference, the model will look like
y = a*treatment + b*time + c*treatment*time + intercept
Where a
is the between-group difference at time zero, and b
is the effect of time in the reference group. Of course, you’d have a different beta per moment since time was modeled categorically.
We can reshape this and get
y = treatment*(a + c*time) + b*time + intercept
The contrast between treatment (1) and control (0), thus, will be
y1 - y0 = a + c*time
However, in a randomized trial a
will likely approximate zero, so c
(the interaction coefficient) will approximate the contrast.
Once you use the baseline variable as a covariate and not as an outcome, your time reference switches to time = T1 (not T0). The interactions, therefore, refer to the difference in treatment effects across time points, e.g., for T2, c
would represent the difference between the effect when time = T2 and the effect when time = T1.
Now you have
y = a*treatment + b*time + c*treatment*time + d*baseline_value + intercept
However, a
does not represent the baseline difference anymore, but the difference at time = T1.
Again,
y1 - y0 = a + c*time
When time = T1 (new reference), the contrast you are looking for will be a
.
When time = T2, it will be a + c
, and c
would assume a different value for each time point.
Thus, the contrasts you are looking for would emerge from the sum of a
and c
at the moments of interest for both scenarios. However, when the baseline value is considered in the outcome, a + c
becomes very close to c
.
Nicely done. This approach is more trouble that treating baseline as baseline, and fails the @Stephen issue that baseline cannot be a response to treatment. You’ve shown in linear models there are tricks to force the alternative approach to work, but in general with nonlinear and ordinal models I think it’s asking for more trouble.
Hi @martinspn,
thank you so much for your thorough reply, it was really helpful.
In that case then, the authors should have indeed used the contrasts for the treatment effect at each time point, since a only approximates zero, thus the contrasts would take into account subtle differences between the groups at baseline. Correct?
On the other hand, it’s not a valid approach to model the baseline as a response. Correct @f2harrell?
Correct, that’s not valid in my view and just adds complexity to boot. Contrasts within a model that has baseline will automatically adjust for baseline differences (not much of an issue in randomized trials) and for strong baseline effects (helps power in all cases). Simultaneous confidence intervals would be nice, and usually emphasize the last time point contrast. With an ordinal Markov model you can get something else very valuable: the difference in mean times in certain states.