leonardof, response was hemoglobin, apologies for not clarifying that.
What I ended up doing was as follows:
1.Breaking hemoglobin pre into quartiles (in prep for a pre-post comparison)
2.Fitting a mixed effects model with time (pre/post) as a factor:
lmer(hemoglobin ~ time + (1|subject))
3.Using the variance components from (2) to conduct a simulation that involved repeatedly dividing individuals into hemoglobin based on pre values. Simulation is used to estimate the expected change due to regression to the mean.
sim <- function() {
individuals <- rnorm(n, 0, sd_btw)
errors_pre <- rnorm(n, 0, sd_within)
errors_post <- rnorm(n, 0, sd_within)
pre <- individuals + errors_pre
post <- individuals + errors_post
ix <- order(pre)[1:(n/4)]
mean(post[ix]) - mean(pre[ix])
}
mean(replicate(1000, sim()))
4.Obtain t-based CI after subtracting the estimated regression to the mean effect
Limitations that I see with this analysis:
1.Arbitrary discretization
2.Doesn’t incorporate uncertainty involving regression to the mean effect
3.Yes, effect is not causal
However, I think 1 and 2 are not too hard to fix: run the simulation as a linear regression rather than breaking into quartiles and then do
lmer(hemoglobin ~ rcs(time,3) + (1|subject))
and take the difference between actual spline and line estimated from regression to the mean to judge “treatment effect” (not causal treatment effect). To fix the second issue, just make use of the distribution from the simulation rather than just using the mean.
Overall, I would argue that there is at least some value in getting rid of the regression to the mean effect even if you you have a design for which potentially everything is confounded. Benefits of this approach are as follows. Consider the model: yij = mu + alpha_i + beta_j + epsilon_ij
alpha is the pre/post effect, beta is the subject effect, epsilon is the error. When we focus on estimating the causal treatment effect for pre/post as a function of the pre values, there are two confounders that come into play: a) epsilons - only bc. we want treatment effect as a function of pre and b) systematic population-level confounders. Removing the known issue of regression to the mean allows someone to focus on the likely magnitude of population-level confounders and consider whether it is reasonable to make an argument that the known confounders will have effects substantially below the magnitude of the estimated pre/post effect (giving greater evidence of a causal effect).
Obviously not nearly the same ability to rule out confounders, but better than not addressing the magnitude for regression to the mean.