Reporting both baseline-adjusted effect and change-from-baseline group estimates

I recently analysed a randomised controlled trial whereby I adjusted the effect estimates to their respective baseline measures in a regression model, a practice I usually perform when analysing trial data (bunch of literature on this incl CONSORT statement).

When writing up the clinical study report I generally provide the the point and interval estimates of the effect adjusted to baseline measure (and other covariates if prespecified) and the point and interval estimates of the endpoints for each group. Furthermore, I plot the descriptive endpoint statistics per group over the various visits of the trial.

However, the investigators of the recent trial have asked me to provide, besides the effect estimate, the change-from-baseline estimates per trial arm.

I find this a little problematic as the difference between the two arm point estimates for change-from-baseline is often not the same as the point estimate of the effect adjusted to baseline (for reference see Twisk, J. et al. Different ways to estimate treatment effects in randomised controlled trials. Contemporary Clinical Trials Communications 10 , 80–85 (2018), specifically Table 5 Eq 1a vs 3a).

I’m a bit stuck on this as I think readers might be confused to what they may see as a discrepancy between the effect estimate adjusted to baseline and the difference between the two change-from-baseline group estimates. The investigators’ justification is that they would like to show the change from baseline for each arm.

I’m sympathetic to their request as the trial has an active comparator, and a side-by-side demonstration of change from baseline may be interesting. However, this is ultimately not the research question of the trial; the difference between the groups is and I think showing these changes in a graph can also be more informative.

I’d be very grateful for any advice on this and on how you would normally report these findings?


Do not report the change from baseline. Because of reasons laid out in detail here, section 14.4 change from baseline is misleading. Regression to the mean is one of the prime issues.


I generally analyze and report RCT results much as you describe, but you could also analyze change scores while still adjusting for baseline scores. This is just a re-parameterization of the model that analyzes post-treatment scores adjusted for baseline, and so the results will be the same, but expressed in terms of change scores. In Table 5 of the paper you reference, these correspond to models 1a and 3b, which show identical treatment effect estimates and standard errors.

No, that only works in the special case of a linear model with interval-scaled variables, and most importantly, readers will take your results out of context, quote them at medical conferences, make false conclusions, … Just say no.

1 Like

Got it. Thanks. Yes, I was assuming a linear model; but your point about inviting misinterpretation is a good one.

Many thanks for your responses and advice. @f2harrell this is indeed in line with what Twisk et al observe. I’ll get back to the investigators about this.

Sorry for re-opening an “old” topic and for doing it with only questions and no answers. However, I would be very grateful for any advice of the expert Datamethods’ community.

As many other non-experts, during my poor stats courses I was instructed to invariably analyze serial data using repeated measures ANOVAs or t-testsc comparing difference (post-pre) scores. From discussions in this forum and from the BBS course (which I am following now through youtube) I am fully aware that nobody should use any of those flawed strategies. I am also aware of your recommendations for randomised controlled trials (i.e., results should be analyzed with a non-linear regression method that incorporates “baseline” scores as covariates of post-intervention scores).

However, I have two doubts:

1.- How could one proceed if conducting an observational study in which the performance of a single group of participants in a cognitive test is repeatedly tested? (I can provide more detailed information if relevant)
2.-How individual “change” scores could be obtained in the previous scenario (and, if possible, also in the case of more than one group; eg. randomized control trials)? I ask this question because I feel that it could be interesting to have this kind of scores for additional analyses (e.g. to see their relationship with other variables collected at baseline or at any other relevant time point, to describe their distribution, etc).

Thanks in advance for any possible answer.

This article discusses ANCOVA in the context of observational studies.

1 Like

From that paper, and others, one sees that the considerations in observational studies are completely different from the considerations in a randomized study.


Thank you very much for your answers and for the reference (from which I’ve learned new things). However, this article deals with scenarios that focus on between-group comparisons, while my main question was about which measures would be adequate to describe performance changes in a single group that is repeatedly tested. More specifically, I would like to estimate how much the group (as a whole) and each individual improve performance (e.g. number of correct responses) when repeating a specific experimental task for N consecutive days.

In this regard, in the last weeks I have been considering some options, but I do not know which one (if any) could be adequate. Therefore, any feedback would be very much welcome.

1.-Because one main concern about difference scores is regression to the mean, would it help to calculate difference scores on “extended” baseline/ end (e.g. calculate the difference between averages of the first 2 or 3 tests vs. the last 2 or 3 tests)? If correct, this simple “strategy” could be potentially applied to individual/ group level measures.

2.-When trying to summarize change at the group level, would it help to use summary statistics that could be less sensitive to extreme values? For example, using the median instead of the mean to calculate “post - pre” differences. (PROBLEM: even if this “strategy” would help at the group-level, it would not provide a strategy for calculating an adequate index of change at the individual level).

3.- I have also considered evaluating individual change in terms of whether performance shows a monotonic, not necessarily linear, relationship with the number of tests (e.g. calculating Spearman’s/ Kendall correlation between scores and session number). These individual measures would be relatively robust to extreme values and they would allow calculating group-level indexes (e.g. mean or, preferably, median+other quantiles of the correlations’ distribution). PROBLEM: this approach would assign the same “improvement score” to subjects with different learning rates (e.g. subjects increasing 1,2,…n correct responses at each session)

In summary, I still do not know how I should proceed. Therefore, I would very much appreciate your expert opinions in order to find a solution (or, at least, to discard some inadequate “strategies”)