Collider in RCT Subgroup Analysis

I came across this interesting article: A proposal for capturing interaction and effect modification using DAGs

Interaction: they propose that this term should be used specifically when both treatment and “subgroup” have a direct causal effect on the outcome

Effect modifier: they suggest reserving this strictly for situations where the second factor (the modifier) does not have a direct causal effect on the outcome itself.

I want to extend this framework to subgroup analysis in randomized trials, where one would focus on the parameter related to T \times S to answer the question:

Is the treatment effect different between subgroups?

I noticed that stratifying by subgroup and including the T \times S parameter could open a backdoor path in both interaction or effect modifier frameworks if an unobserved variable influences both T \times S and the outcome:

ie T \times S would act as a collider.

Has this phenomenon in subgroup analyses been described before in RCTs? How should one interpret such analyses if there is a risk of opening backdoor paths through unobserved variables?

4 Likes

Yup, this remarkably came out at the exact same time we described the same thing using similar DAGs (a major difference is that we adapted them towards selection diagrams and described the do-calculus considerations). See also here.

The potential collider bias you point out is what @Stephen eloquently refers to when cautioning that in subgroup analyses we can assign individuals their treatment levels but not their covariates.

2 Likes

This was extremely helpful! I have a clarification question: In the example you give, if EGFR mutation was the only cause of oncogenic EGFR, would we still get a biased estimate in a subgroup of patients with oncogenic EGFR signaling? I would think not as in this instance this would be equivalent to an RCT in patients with EGFR mutation, would it not?

I’m also somewhat confused by the treatment indicator causing EGFR signaling. Temporally, I would assume EGFR signaling to already be present / absent before randomization (?) .

1 Like

This is exactly the kind of question that will help me in figuring this out. It seems to me that if you could have done a meaningful and easily interpretable clinical trial on the subgroup in question, you should be able to figure out interaction effects involving that factor in a larger trial.

1 Like

I’m still a bit confused.

People who are part of the subgroup also have a higher/lower risk for the outcome than people not part of this subgroup, even if they are not given the treatment. That’s @Pavlos_Msaouel EGFR example. People from the subgroup also have a different relative treatment effect (on the scale of interest) when given treatment than people not in the subgroup.

The second DAG:


Being part of the subgroup tells us nothing about the risk of the outcome if no treatment is given. Under treatment it does.

This DAG is confusing me:

Why is there an arrow from \text{Unobserved} \rightarrow \text{T x S} ? We are not modeling the \text{T x S} Interaction using U?
I think this corresponds to this graph from Pavlos publication. But the above DAG would somehow lead me to think that people who are part of U and \text{subgroup} have a different relative treatment effect than people who are part of \text{subgroup} and not U?

Quote from Epi paper which helps a bit:

It is possible that there are factors that influence the likelihood of both expo- sures occurring concurrently (marked by an arrow into the interaction E G node) and also influence disease risk (marked by an arrow into the disease D node); such a fac- tor would create a back-door path and this would be ex- plicitly visualized in the DAG (see Figure 2c). Continuing with our example of smoking (E), asbestos (G) and lung cancer (D), a potential confounder (C) would be a factor that increases the likelihood of both smoking and asbesto- sis exposure, such as socio-economic status. Although this back-door path could also be captured by arrows from C to both E and G if the E G node were omitted, the inter- action node prompts the researcher to think about factors that affect both exposures simultaneously.

I’ll try to simulate some data to wrap my head around this.

1 Like

Yeah, this why I chose to model the confounding influence of U on the baseline variables that we know at time 0. While simpler than modeling the interaction, it has very high payoff in practice to focus on this.

Correct. This is also related to @f2harrell’s comment:

Indeed, we exactly discuss and formalize this point in Section 3.4 here using the example of HER2. Notice that contextual knowledge from correlative and functional lab research is needed to choose the subgroup and develop the therapy for it. Hence the focus on that paper on transporting such knowledge across domains.

Notice the qualifier oncogenic EGFR signaling (not just all EGFR signaling which exists in normal cells). The oncogenic mutations on the tyrosine kinase domain of EGFR induce oncogenic EGFR signaling that can then be targeted (causally modified) by EGFR tyrosine kinase inhibitors.

I had forgotten that the Impervious to Randomness paper focused on teasing out oncogenic EGFR signaling. The DAG was drawn in my head during a day hike with my then soon-to-be wife around Santorini on 7/18/2021. I drew it on the piece of paper below and then wrote that manuscript as a way of not forgetting this concept. But as shown here (from 1:34:00 onwards) that EGFR pathway mental dissection allowed us subsequently (in May 2022) to come up with the most powerful therapy developed to date for renal medullary carcinoma – the deadliest kidney cancer in adolescents and adults. There are patients alive today (some even cancer free) that would otherwise no longer be with us if not for this.

Once we started thinking in a structured way about randomizing a patient’s covariates to remove these confounders this led to sampling theory. Then we spent a lot of time thinking about the implications of random sampling versus random treatment assignment and wrote this very long paper to summarize these points.

Depressingly, this line of thinking then allowed me to recognize the oxymoronic nature of randomized non-comparative trials (RNCTs). To this day, I struggle to convince some biostatisticians why RNCTs are such a bad idea. These DAGs are one method of communicating these concepts but they still need attention and may not work for everyone. Different tools may be a better fit for at least some people.

1 Like

The DAG encodes our own (not other peoples) assumptions and I do not really agree with these authors. Assuming SG is a categorical variable with two levels (present, absent), the DAG on the left below would imply effect modification when presence of SG mediates the effect in which case the direct effect could be zero or partial effect when SG is absent and U then is important. The DAG on the right is the far more common scenario (99%) where SG is a collider (artefact of the sample) in which case U is just prognostic for the outcome.

Addendum: What the conventional framework calls mediation is really induced mediation where T both activates the mechanism and the mechanism exists because of T. What I am describing is better called conditional mediation or a facilitated mechanism, the mechanism exists independently, but T’s effect is entirely channeled through it. Both are legitimately mediation conceptually. The difference is entirely about whether the decomposition arithmetic works, not about whether SG deserves to be called a mechanism. The field has perhaps been too quick to let the limitations of its estimation tools define its concepts.

1 Like

This was an opinion paper in IJE. I have decided to write a counter-opinion given so many (in my view ambiguous) attempts at inserting effect modification into DAGs. I will post the pre-print link here when done. Of note, a drug may work through a receptor to mediate a target effect e.g. GIP analogue through the GIPR, but the drug does not “cause“ the receptor as it exists whether or not the drug is present. Even though the drug doesn’t cause the receptor, the receptor is unambiguously the mechanism of action, and thus a mediator of its effect.

1 Like