Outcome-based subgroups

Thanks Paul. As an update, I came up with a really simple contrived example that I think will suit my purposes, posting here in case anyone is interested or realizes a mistake I’ve made:

I assumed the following null DAG, where treatment doesn’t affect the target outcome at all and (for simplicity) also assumed no effect of the two outcomes on each other. Sex affects both outcomes but is not associated with treatment because of randomization.

Expectation would be that conditioning on the response outcome introduces collider bias (selection bias) and opens up the backdoor from trt -> outcome via sex. Code outlined below to walk through a super toy example: including treatment term only gives no treatment effect; conditioning on response results in stat sig treatment effect; if you then add sex as a variable to close backdoor though collider it goes away.

n <- 10000 # Big numbers to make things easy
trt <- rbinom(n, 1, 0.5) # Treatment is random
sex <- rbinom(n, 1, 0.5) # Sex is random
out1_theta <-2 + sex*2  # Outcome 1 is influenced by sex but not treatment
out2_p <- qlogis(0.2) + log(5)*sex + log(5)*trt # response outcome is affected by sex and treatment

y <- rnorm(n, out1_theta, 1) # outcome 1 is continuous
r <- rbinom(n, 1, plogis(out2_p)) # outcome 2 (response) is binary

dat <- data.frame(y, r, trt, sex = sex) # make a data drame
lm1 <- lm(y ~ trt, data = dat) # No treatment effect in unconditional
summary(lm1)

lm2 <- lm(y ~ trt + r, data = dat) # Treatment is stat sig when you condition on response
summary(lm2)


lm3 <- lm(y ~ trt, data = dat %>% filter(r == 1)) # equivalent by subsetting instead
summary(lm3)
lm4 <- lm(y ~ trt + sex, data = dat %>% filter(r == 1)) #close backdoor by conditioning on sex as well, trt not stat sig
summary(lm4)
1 Like