leonardof, response was hemoglobin, apologies for not clarifying that.

What I ended up doing was as follows:

1.Breaking hemoglobin pre into quartiles (in prep for a pre-post comparison)

2.Fitting a mixed effects model with time (pre/post) as a factor:

```
lmer(hemoglobin ~ time + (1|subject))
```

3.Using the variance components from (2) to conduct a simulation that involved repeatedly dividing individuals into hemoglobin based on pre values. Simulation is used to estimate the expected change due to regression to the mean.

```
sim <- function() {
individuals <- rnorm(n, 0, sd_btw)
errors_pre <- rnorm(n, 0, sd_within)
errors_post <- rnorm(n, 0, sd_within)
pre <- individuals + errors_pre
post <- individuals + errors_post
ix <- order(pre)[1:(n/4)]
mean(post[ix]) - mean(pre[ix])
}
mean(replicate(1000, sim()))
```

4.Obtain t-based CI after subtracting the estimated regression to the mean effect

Limitations that I see with this analysis:

1.Arbitrary discretization

2.Doesnâ€™t incorporate uncertainty involving regression to the mean effect

3.Yes, effect is not causal

However, I think 1 and 2 are not too hard to fix: run the simulation as a linear regression rather than breaking into quartiles and then do

```
lmer(hemoglobin ~ rcs(time,3) + (1|subject))
```

and take the difference between actual spline and line estimated from regression to the mean to judge â€śtreatment effectâ€ť (not causal treatment effect). To fix the second issue, just make use of the distribution from the simulation rather than just using the mean.

Overall, I would argue that there is at least some value in getting rid of the regression to the mean effect even if you you have a design for which potentially everything is confounded. Benefits of this approach are as follows. Consider the model: yij = mu + alpha_i + beta_j + epsilon_ij

alpha is the pre/post effect, beta is the subject effect, epsilon is the error. When we focus on estimating the causal treatment effect for pre/post as a function of the pre values, there are two confounders that come into play: a) epsilons - only bc. we want treatment effect as a function of pre and b) systematic population-level confounders. Removing the known issue of regression to the mean allows someone to focus on the likely magnitude of population-level confounders and consider whether it is reasonable to make an argument that the known confounders will have effects substantially below the magnitude of the estimated pre/post effect (giving greater evidence of a causal effect).

Obviously not nearly the same ability to rule out confounders, but better than not addressing the magnitude for regression to the mean.