Collider in RCT Subgroup Analysis

Food for thought. I understand your rationale and I see what you mean. Obviously, the subgroup designation as an effect modifier has to be justified by prior data or knowledge – and if you disagree with the rationale then obviously you disagree with the conclusion.

But this is a different argument than your post- randomisation biomarker driven collider argument that is presented in the paper is it not?

1 Like

Well, I roll the rationale into the conclusion as follows:

a) if one decides the third variable is prognostic, then it cannot be either post-randomization nor be gating a post-randomization variable

b) a mediator is clearly a post-randomization variable - induced by the treatment (I call this an induced mediator)

c) an effect modifier is NOT a post-randomization variable but clearly gates such a variable (I call this a conditional mediator aka proxy for an induced mediator)

Now all three are prognostic for the outcome so they only differ by one of the above. My point is that each needs a specific analysis and that if a variable either gates a post-randomization variable or is a post-randomization variable then in both cases it is a mediator in some sense. So the first conclusion is that a variable can only be an effect modifier if it influences at least one mediator in a causally sufficient representation of all pathways from treatment to outcome. The second conclusion is that the mediator collides on sample selection with the treatment so mediators should not enter the outcome model to avoid bias.

IMO the perceived need for oxygen or MV was a proxy for the virus/immune system induced hyper inflammatory state of a cause specific pulmonary disease. I think this was the general view. This creates a causal integrity RCT (CIR) of a pragmatic biologic data generating process (BDGP).

The conclusion of this trial seems quite well supported from a clinical perspective. However I appreciate your depth of thinking relevant this trial.

1 Like

I fully agree with this. If dexamethasone works by:

Dexamethasone → ↓Hyperinflammation → ↓Respiratory failure → ↓Death

then oxygen need is downstream from inflammation and is endogeneous to the treatment mechanism.

Are you saying that the BDGP framing is vindicated in RECOVERY’s design because it achieves causal identification by randomization dissolving the mediator problem? Do you mean that the CIR isn’t clean because oxygen status is a perfect biological proxy but rather It’s clean because randomization makes the control arm a valid counterfactual for each oxygen stratum in the treatment arm and the causal integrity comes from randomization preserving exchangeability, not from the proxy being causally pure?

Let’s start with the key issue which is: what defines a CIR versus a CAR/SDGP design?

I agree that randomization preserves exchangeability within the oxygen strata and therefore supports internal causal identification within the trial. But from my perspective, exchangeability alone is not sufficient to define a CIR.

A CIR is not defined merely by successful randomization.It is defined by whether the enrollment gate selects a coherent biological data-generating process such that the trial estimand is E2 and corresponds to a functionally causal system rather than a synthetic mixture. This is also true of the subgroup.

In other words, the defining question is what kind of causal system (S=1) actually represents at the main gate and (S=1)* at the subgroup gate.

First we look at the main gate and it’s one disease, a valid BDGP.

Second we look at the subgroup gate and it’s a proxy for an enrichment phenotype that, while not pure, is strongly supported by respiratory pathophysiology.

In that framework, the control arm serves as the counterfactual for the treated arm within each oxygen category.

Conversely, if the gate is a SDGP such as the ARDS criteria, that primarily aggregates a mixture diseases and divergent treatment-response mechanisms, or the subgroup gate is a SDGP then the trial shifts toward a CAR/SDGP structure even if the randomized comparison itself remains internally valid. The primary gate was an SDGP for the COVID ventilator guidelines which failed and have been abandoned.

When the gate is an SDGP (that is not rescued by a narrowing coherent physiologic target in substantially all the participants), the estimand has moved to layer three (E3) and is dependent on the instant mixture of causal systems.

So for me the central issue is not whether randomization solved a mediator problem, but whether the main gate and subgroup gate selected a coherent causal system capable of generating transportable biologic knowledge.

This is a preprint in review. Note it does not address subgroups. Your presentation has made me realize that the CIR to CAR conversion can occur at the subgroup level even if the main gate (S=1) was an acceptable BDGP. but it is my argument that the RECOVERY design remained a solid Fisher//Hill CIR and grade A evidence.

I would tend to agree with you that the enrollment gate selects a coherent biological data generating process (BDGP) because COVID-19 is one disease, and that the subgroup gate preserves causal integrity randomized controlled trial (CIR) status because oxygen requirement proxies the underlying inflammatory state.

The third question, which is what I raised, is whether conditioning on this proxy introduces collider bias. This is not addressed in your framework. Exchangeability within oxygen strata supports internal causal identification, meaning that within each oxygen group, the dexamethasone and no-dexamethasone patients are comparable. But internal causal identification within a stratum defined by a proxy for the treatment’s own mechanism is not the same as clean causal identification of the treatment effect itself.

The most important gap I feel is that you don’t consider what I would call the treatment pathway problem. If dexamethasone works by suppressing inflammation, and oxygen requirement reflects how far that inflammation has already progressed, then the greatest causal contribution of dexamethasone should be in patients not yet requiring oxygen, where the drug intercepts the inflammatory process earliest, before irreversible lung injury has occurred. Yet RECOVERY’s subgroup analysis shows no benefit in that group. This means the subgroup gate may have systematically misidentified where the biological process actually generates the treatment response. This is not a cause agnostic randomized trial (CAR) problem at the main enrollment gate, rather it is a causal inversion at the subgroup gate that your CIR framework as currently presented can probably not detect.

Regarding your statement that RECOVERY is Grade A evidence, the overall intention-to-treat estimate, which is that dexamethasone reduces mortality in hospitalized COVID-19 patients, is probably Grade A and reflects a solid CIR finding. But grading the subgroup finding as Grade A requires accepting three things: first, that oxygen status is causally independent of dexamethasone’s mechanism; second, that conditioning on it doesn’t introduce collider bias; and third, that the subgroup treatment effect is transportable to other settings and populations. None of these are established by the trial design itself.

Your framework is a genuine advance on standard thinking about trial design and how treatment effect estimates should be interpreted. The CIR versus CAR distinction, and the concept of a synthetic data generating process (SDGP), where the enrollment gate artificially aggregates patients from distinct biological disease processes rather than selecting a single coherent one, are valuable and clarifying contributions. Your cause-mixture paradox, where treatment effects reverse or disappear across trials simply because the mixture of underlying diseases at the enrollment gate has shifted, is particularly important for understanding why trials of the same drug sometimes contradict each other.

However, the framework may benefit from the discussion in this thread and I do not want us to change focus back to your framework as that is how this thread started: Asking the question that even within a coherent single-disease biological process, a subgroup gate defined by a variable that proxies the mediator of treatment effect, the biological pathway through which the treatment works, cannot guarantee CIR status at the subgroup level, regardless of how physiologically plausible that proxy appears. This is because conditioning on a mediator proxy may artificially partition what is actually a single continuous biological process into apparent subgroups that do not correspond to distinct natural disease states. In other words, a coherent biological process at the main gate can generate what amounts to a synthetic data generating process (SDGP) at the subgroup level, not through disease mixture as you describe, but through causal pathway mixture induced by the subgroup gate itself. The cause-mixture paradox you identify across trials may have a precise analogue within a single trial when subgroup gates induce causal pathway mixture

This is not a contradiction of your framework. It is a natural extension of your own estimand logic applied one level deeper, to the subgroup gate rather than the main enrollment gate. However I would propose defining such a third variable (e.g. O2 status in RECOVERY) as an effect modifier if it can be thought to be a pre-randomization proxy for the mediator’s baseline state (hyperinflammatory state), a variable that appears to be a standard pre-randomization effect modifier but is in fact a proxy for the baseline state of the biological mediator through which treatment operates. This distinction matters because standard effect modifiers that are causally exogenous to the treatment mechanism can be safely used to define subgroups within a CIR (I do not know if these truly exist and that is another question for you!). Mediator proxy effect modifiers cannot because conditioning on them may induce collider bias and artificially partition what is actually a single continuous biological process into apparent subgroups that do not correspond to distinct causal states.

The treatment pathway approach I proposed, comparing each treatment-defined pathway against a single undivided control arm, with balance achieved through randomization, is intended as a practical solution to this problem, avoiding conditioning on the mediator proxy entirely. However there are clearly more thoughts needed on this, and I would be very keen to hear your views on whether treatment pathway analysis could serve as a viable alternative to subgroup analysis in RCTs where mediator proxy effect modification is suspected. The broader question of where to draw the line between a safely exogenous prognostic variable and a mediator proxy effect modifier remains open and is perhaps the most important unresolved issue this discussion has surfaced.

Addendum

The assumption that a subgroup-defining variable is causally exogenous to the treatment mechanism is required for safe subgroup analysis within a CIR, but this assumption cannot be guaranteed for any strongly prognostic biological variable and should be treated as a testable hypothesis rather than a default. Indeed, If a variable is truly exogenous to the treatment mechanism it is unlikely to be a strong effect modifier. If it is a strong effect modifier it is unlikely to be truly exogenous.

1 Like

There are biological and mathematical reasons not to draw this conclusion. COVID virus infection only uncommonly produces the hyper immune response which is potentially fatal so treating early could be dilutional and potentially harmful, I discussed the potential for corticosteroid induced harm when only a weak antiviral safety net is available in my discussion of the CAP trials.

I agree with you and this is a very important contribution because it was not part of my original framework relevant SDGP trials.

I will acknowledge that grade A may be over generous because it rests on a subset and biologic assumptions.

I like your pathway analysis but the question remains. How does one enrich for the biologic treatment target and at the same time mitigate the potential for generating a synthetic pathway partition?

Well, the pathway was the presumed answer, failing something better comes along or someone points out a flaw. What I am not sure about is if comparable weights through randomization would be better than equal weights for the control group given that pathway imposes equal weights on the treated.

A general conclusion also emerges regarding effect modification and subgroup analysis that was started in this thread. The standard defense of subgroup analysis in RCTs rests on the assumption that the subgroup-defining variable modifies the treatment effect from outside the causal pathway, that is that it is exogenous to the treatment mechanism. This is the implicit foundation of precision medicine approaches to treatment heterogeneity. What this discussion has demonstrated is that this assumption is self-defeating: for a variable to be a strong effect modifier it must interact with the biological pathway through which treatment operates, but that interaction is precisely what makes it a mediator proxy rather than a causally exogenous modifier. A variable that is truly exogenous is unlikely to be a strong effect modifier; a variable that is a strong effect modifier is unlikely to be truly exogenous. Conditioning on such a variable in the outcome model therefore introduces the risk of collider bias, potentially distorting rather than clarifying the picture of who benefits from treatment. This means that the standard subgroup analysis approach cannot safely identify treatment effect heterogeneity in biological systems, and alternative strategies, such as the treatment pathway approach, are needed to assess effect modification in clinical trials without conditioning on causally entangled variables. Finally to the question of the DAG by Attia that started this thread: Is it a correct representation of effect modification? The answer is likely to be no.

Addendum

I have significantly updated the preprint linked above to reflect this discussion because now we have both upstream (P) and downstream (B) variables of the mechanism (M).

1 Like

This thread raises a problem for the predictive vs prognostic biomarker researchers. The decision of which statistical model to use requires researchers to commit to a mechanistic position before touching the data. The field largely doesn’t do this because it fits models, finds significance, and then constructs the biological narrative post-hoc. Which means in practice, the additive vs. product term choice is often made on statistical grounds, whichever fits better, when it should be made on ontological grounds before the analysis begins. In short, a prognostic biomarker must be orthogonal to the treatment mechanism while a predictive biomarker must be entangled with the treatment mechanism. The latter determination alone guides the analytical decision since neither tests of interaction nor replicability are reliable. The interaction test cannot confirm genuine pathway entanglement, replication confirms consistency not mechanism, predictive biomarkers are always also prognostic, so additive signal is always present and the analytic result is ontologically inert with respect to the distinction. So the choice of model, additive vs. product term, cannot be derived from the data itself. The data will fit both to varying degrees and won’t tell us which is correct.

2 Likes

Yes, I love this framework because it is teachable to clinicians.

However, I wonder whether orthogonality is always absolute once a “prognostic” marker is deliberately used for pathway enrichment.

A predictive biomarker establishes the mechanistic pathway. A prognostic biomarker may then be used to enrich for patients in whom that pathway is sufficiently active to avoid dilution of the treatment effect. In that setting, the prognostic biomarker does not define the disease or identify the causal pathway. Rather, it measures the intensity or expression of an already established pathway.

As a result, such a marker may become associated with treatment responsiveness, not because it is itself mechanistically entangled with the treatment, but because it reflects the degree to which the targeted biological process is active.

Perhaps your definition would classify such a marker as predictive once that association exists. However, I wonder whether there is value in distinguishing a third category: an “enrichment biomarker.” This would be a marker that neither establishes the pathway nor serves as a purely orthogonal prognostic marker, but instead quantifies pathway intensity within a biologically defined population.

For example, a pathogen-specific marker may establish the disease pathway, while a physiological variable such as oxygen requirement or P/F ratio may enrich for patients in whom that pathway is most active and therefore most likely to reveal a treatment effect.

Would you consider such pathway-intensity enrichment markers to remain strictly prognostic, or do they occupy an intermediate position between prognostic and predictive biomarkers?

1 Like

I am not sure enrichment can be classified as prognostic because you might agree that both predictive markers and enrichment markers are entangled with the treatment mechanism (M). The distinction rests on whether the marker establishes pathway operability or merely quantifies intensity within an already-operative pathway. Biologically, that line may be continuous (a dose-response relationship with treatment effect conditional on biomarker-positive status) rather than discrete. Pathway operability may itself be a matter of degree, in which case predictive and enrichment collapse back into a single continuous construct and the distinction dissolves.

1 Like

Good point but even if pathway operability and pathway intensity form a biological continuum, enrichment remains a distinct causal category because it presupposes an already-established mechanistic pathway. The enrichment marker modifies the expected magnitude of response within a pathway-positive population, whereas the predictive marker establishes that the pathway is available to the intervention in the first place. Thus the distinction is not necessarily biological but causal and hierarchical.

Predictive marker

Pathway operative

Enrichment marker

Expected treatment effect size

1 Like

Moved from discussion of the 2020 HCQ trial.

This incidentally is an excellent example for your pathway approach. Here we see how a proper enrichment marker requires a physiologic basis.

HCQ hypothesis

  • Reduce viral entry or replication.
  • Most plausible early, when viral load is rising.
  • If effective, one would expect benefit before significant lung injury develops.
  • Study requires a large n because dilution anticipated because only a small portion will develop the fatal hyper inflammatory state in the control population.

Corticosteroid hypothesis

  • Suppress host inflammatory injury.
  • Most plausible later, in population portion where immune-mediated lung damage is evolving.
  • Benefit enrichment would be expected in patients requiring oxygen or ventilatory support (but still very early in those).
  • Early for all with positive PCR might be harmful by suppressing immune limitation of viral replication.

So we see this HCQ trial was not properly enriched in fact it was potentially diluted by restriction to a potentially “to late” set of the infected.

So under pathophysiological (pathway) guidance:

  • trials of HCQ should be aimed at an earlier E₂ population defined by active viral replication. This was not the case here.
  • Steroid trials such as RECOVERY oxygen-requiring patients targeted a later E₂ population defined by inflammatory lung injury.

Both of these are CIRs (actually the HCQ trial is a CIT - a “causal integrity trial”) but based on the perceived MOA, the gating is dilutional in the HCQ trial and enriching in the steroid trial.

Of course later better HCQ trials including RCT dealt with this time window issue. The point here is simply to discuss the interesting intersection of timing, dilution, and enrichment of pathways in the two therapies under test.

I think this is quite logical but I am sure there will push-back on both gating and the concept of prognostic vs entanglement with the treatment mechanism since the entire current view of subgroup analyses is purely external to the pathway. Perhaps your paper will begin to change this view but I must confess that it is a difficult area to grasp and until I began to question mediation / subgroups I did not fully appreciate the concepts you have been raising for some time. Now it all finally begins to make sense.

Your analysis, thinking mechanistically about subgroup gates, makes it difficult for your reader to justify treating enrollment gates as if they were merely administrative criteria. They too become biological claims about what causal system is being randomized.

At both the enrollment gate and the subgroup gate, the central question is the relationship of the gating variable to the underlying causal pathway. For subgroups this appears as the prognostic-versus-predictive distinction. For enrollment criteria it appears as the mechanism-defining-versus-severity-defining distinction. These are closely related manifestations of the same causal problem.

I agree that the statistical concerns differ between post-randomization and pre-randomization gates. The point is not that they generate the same bias. Rather, Doi’s mechanistic framework forces us to ask what biological role the gating variable plays. Once that question is asked for subgroup gates, it becomes difficult to avoid asking the same question of enrollment gates. The issue is no longer randomization integrity but causal-system integrity. Both involve interpreting a gate in relation to the underlying biological pathway, even though they occur at different points in the trial.