The growing interest in integrating causal inference and Design Theory

llynn · January 3, 2026, 11:32pm

There is rapidly growing interest in integrating a simpler clinician facing of structural causal modeling (SCM) called causal symbolic modeling (cSM) in critical trial design. I am working on a paper which presents cSM as a bridge between the clinician, causal inference, and design theory.

The need for better critical care trials and better explication of cause agnostic RCT is driving this interest. (See link) But the epistemic gaps between 1. design theory, 2. causal inference, and 3. clinical science are substantial.

SCM itself can generate so many nodes that it overwhelms and much of this is addressed by randomization . But on the other hand, as proven in critical care syndrome science, randomization cannot solve some hidden pathological design features which are readily detected by cSM.

In many trials iatrogenic (design generated) heterogeneity has been introduced before randomization. Treatment assignment is randomized across a mixed causal population, not within a coherent or disease or cause entity.

Without cSM, SCM can still be correctly applied. DAGs can be drawn, estimands identified, RCTs can ge performed, and effects estimated. But if the prior structure (eg the gate & outcome) are invalid, the estimand represents an average over incompatible causal mechanisms. It is mathematically valid yet causally non-transportable.

I hope to have the paper out soon. I would welcome any comments.

Just to remind of the need, here is a December Petty-Bone sepsis RCT from JAMA. It’s SOFA at the gate plus some thresholds and a “SOFA baseline to multiple endpoint differences” at the outcome.

Remember this was published in JAMA and expected to be the state of the art. But I labored over supplement 2 and I advise all interested in critical care science to do that. It’s data salad.

i liked the move to precision biologic targeting. This is an important gate limitation but otherwise the gate was a wide open triage. It was so wide and survival so low (what killed these people is unknown) that the study is not interpretable. This is sad because they have a target but both the gate and outcome are based on the SOFA composite guessed in 1996.

This type of RCTvneeds to be designed with cSM.

https://jamanetwork.com/journals/jama/fullarticle/2842634

Here is the 90day mortality.

Yet this study was reported as positive based on a 1.7 point difference in SOFA at 9 days but they do not show a daily time series of the SOFA.

Much of this is buried in supplement 2. The point is that these trials are increasingly wasteful. This was a massive multicenter trial in scope and one has to have the highest respect for these ethically motivated workers but the trail lacked causal explication at the gate and outcome and was therefore doomed. Only a massive treatment effect could have bailed that widely gated design out.

Come join this discussion on X or let’s begin a discussion of cSM here.

https://x.com/soboleffspaces/status/2005739600904610040?s=46

Elias_Eythorsson · January 4, 2026, 11:14am

The only references I can find to causal symbolic modeling are written by you. In your recently published 25 page review of your “Petty-Bone RCT” hypothesis, you attribute causal symbolic modeling to Wright and Pearl but do not provide a reference as far as I can tell. Could you provide a reference here?

f2harrell · January 4, 2026, 1:43pm

The majority of “causal inference” solutions to the problem involve oversimplification by pretending that every patient has the same expected outcome, within each treatment group. So I’m failing to see how the causal inference approach is going to help with the problems you’ve identified, as opposed to just using robust experimental design and robust covariate-adjusted outcome modeling.

llynn · January 4, 2026, 7:46pm

Thank you Elias. The paper to which you refer was written for clinicians and clinical trialists. Although it uses the PettyBone mistake as an index case, and certainly proves that particular design is flawed, the paper has a deeper purpose: to use that decades old design error to highlight the fact that, in the present state fatally flawed trials can readily pass through CONSORT and statistician review, but these flaws would be immediately exposed which the design structure is interrogated by cSM as I did in the paper. So the paper is a call for reform of the process of design in clinical science not of design itself.

Causal symbolic modeling (cSM) is a design-level causal framework that interrogates the causal integrity of symbolic trial components prior to formal structural causal modeling (SCM) and/or design

In the paper I demonstrate that CONSORT and conventional thinking about internal RCT validity (for example by @Stephen ) is incomplete because the need for valid transportability is primary to bedside decision making of the clinician.

So if you read the paper closely (sorry about its long length) you will see what cSM is. Pearls SCM is designed for broad structural design. What I discovered was that in clinical medicine we evolved a symbolic heuristic based language which translates poorly to design (and SCM). The paper shows this clearly. It explains how statisticians were fooled by the symbolic language of clinicians. This could also could fool CI practitioners as they venture more deeply into clinical science.

However, the problem is also social. There is an epistemic interface gap which the statisticians do not cross in CONSORT. This is discussed in detail the paper in section titled:

Why did RCT mimicry persist unnoticed until now?

Operationally, the specifics of cSM can be summarized as:

Applying Pearl’s structural causal modeling logic as far upstream as feasible in the trial lifecycle to explicate and interrogate the causal integrity of design assumptions. These assumptions include:

Disease labels
Eligibility criteria and cohort gates
Interventions
Outcomes
Protocol rules and co-interventions

cSM determines whether these symbolic objects can legitimately function as nodes in a causal model or as part of the design structure.

cSM:

1.does not introduce new mathematics (Its Pearl’s stuff)

2.does not replace randomization or SCM

3. does not estimate effect.

3.does not adjudicate causal truth from data

Instead, cSM determines whether a causal question is well posed.

As the paper demonstrates design presupposes that the entities placed into the model correspond to real causal objects. Design can represent causal coherence, but it does not validate it. CONSORT checks internal validity of the design but not transportability. The logical ordering is therefore:

cSM determines whether a causal estimand can meaningfully exist.
Design and/or SCM is applied after (or with) cSM analysis.

The paper makes it clear that when symbolic assumptions are incorrect, design may clear CONSORT and still yield mathematically valid results that are clinically non-transportable. cSM is introduced to prevent this decades old failure mode and to introduce the clinician to SCM and design explication. This also allows deeper (below deck) failure mode analysis beyond the typical questions of CONSORT re: power, compliance, etc which are all pivotal but clearly not enough.

cSM is not really new, it’s simply the language of Pearl placed upstream to be applicable to clinical science (pathophysiology) which forces explication the components of the causal model itself before defining its structure in design, SCM, or both.

There is no reference for cSM yet as the paper is in progress but you can grasp it from the linked paper alone if you see that the linked paper is less about the PettyBone RCT mistake then it is about how to align design with a valid and fully explicated causal model. The basic teachings of Pearl is all that’s required to understand cSM.

The place to start is “The Book of Why”. It’s a good basic source of the concepts embodied in cSM.

llynn · January 4, 2026, 11:15pm

Both design and CI make assumptions (pretendings). Internal assumptions for CI and external assumptions for DT. These assumptions are quite synergistic in their mitigations, since both can be approached by the alternative discipline when combined.

Sepsis trialists and statisticians have assumed, for 34 years, that the effects they calculate are transportable to everyone who enters the gate. That has proven false with much loss. However when DT “pretended” that the RCT derived ventilator protocols for ARDS were transportable as EBM to severe COVID pneumonia, that was wrong at global scale and produced a bedside clinician revolt. This design based assumption error cannot be allowed to pass without corrective action.

As I presented to Elias, cSM is not CI but rather the use of Pearl’s language to assure a design (or causal inference based study) reasonably can generate transportable results since “CONSORT cleared designs which are falsely attributed a transportability function have proven to be associated with adverse outcome.

I agree that design direct by a DT expert with a pathophysiology clinician and no epistemic gap in communication COULD probably solve most of these problems alone, but history has proven that they won’t. Virtually no statistician will even acknowledge the problem of iatrogenic non-transportability as a function of the task force defined gate. Without intervention the task forces are not going to change their approach.

As the paper describes in detail, cSM comprises the synergistic connection between design and CI needed to begin the process of bridging the epistemic interface gap. In a broader sense I’m trying to facilitate synergistic work between you and Pearl by finding the portion of DT and CI where you will both will find agreement if you look.

llynn · January 6, 2026, 5:19pm

Elias, this is simplest way to illustrate what cSM is:

Below is the standard CONSORT approved path for an RCT. “Inclusion” is the selection gate (S=1) which is set by a task force.

This looks valid. Naive OS and CI based SCM could start also here. This might be called “trust based design theory” because the design statistician is trusting the taskforce to understand what defines a valid measurement for defining a valid cohort. But many task forces do not understand that.

So this “standard RCT” is transportable only if the task force created “RCT valid” inclusion (selection). However since many taskforces have not been taught by the statisticians how to do that and CONSORT doe not require it, many standard (CONSORT approved) RCT will not transport as a function of standard (pathological consensus based) design error. The false view that they are transportable is a major problem.

In contrast below is the cSM model (although T and Y would be further modeled) Note here cSM uses causal modeling to examine the task force derived cohort itself. That’s the key:

cSM forces the modeling the components of the model.

CSM is required because statisticians and clinicians need to have a language in which to build the components of the model. “Words” have proven inadequate. They need a way to include modeling of the components in the checklist to assure a valid cohort in CONSORT.

cSM is simply “the causal model of the components” of an RCT or OS. A CI expert might trust an offered node by a clinical expert as DT experts have, so cSM forces the question for CI experts too. Since DT experts do not recognize that this is necessary. cSM forces the question.

Since I have had no substantive effect on correcting this problem by teaching statisticians and trialists over the past years, I am advancing the cSM to clinicians and funding bodies (eg the NIH) because this is subtle, anchor bias entangled stuff. Words are ineffective tools of change when dealing with a Lakatos Scientific Research Programme

So cSM is just SCM taken upstream as far as possible designed to guide DT. DT directed RCT, OS, as well as CI might assume the expert task-force selections are valid. cSM says prove it to me.

stephenrho · January 6, 2026, 7:21pm

I have been trying to follow the various threads on here where this has been raised and still don’t understand how graphs like those above illuminate the real problem you have identified in critical care beyond a general (word-based) statement that “average treatment effects are of little use, or are misleading, when there is (probably) substantial treatment effect heterogeneity”. The real contribution is in the strength of the argument that there is meaningful HTE in a particular scenario. What is the main benefit of these symbolic graphs? Especially given that these kinds of causal graphs are agnostic on the presence of HTE/treatment effect interactions.

Relatedly, it also isn’t clear how strong the connection is here to Pearl’s DAGs, as in these graphs the arrows appear to have a different meaning. In a randomized experiment it is not possible for X to cause treatment selection, which I would take from X –> T in a conventional DAG. If this changing of the meaning was intentional, what is the proposed benefit?

llynn · January 6, 2026, 8:03pm

Thanks. Yes it is hard to understand because it’s hard to believe this is design would pass CONSORT with such a simple obvious error, conditioning on a disease agnostic X generated by hundreds of different diseases. WHO thinks that would work? Answer: virtually everyone in critical care science despite five decades of failure of transportability.

This is not the classic biology associated HTE, this refers to “Synthetic HTE”: HTE generated by symbolic aggregation such as by a task-force generated triaging threshold (X). More specifically, this synthetic HTE is caused by the absence of SCM during trial design, which allows a mismatch to persist between the causal structure of the trial actually implemented and that of the trial investigators intend to run.

This difficult to articulate because the design is pathological so the representative DAG is pathological.

The best place to start is with this article which describes the development evolution and consequences of the streamlined (high n generating) threshold triage based RCT.

llynn · April 26, 2026, 1:39pm

Elias I wanted to get back to discuss this with you. I hope you will provide your thoughts.

The issue raised by the proposed cSM framework is that current trial design imposes no structural requirements on the selection rule S (nor, more broadly, on the covariate set). Indeedthere is no requirement for S to correspond to a biologic entity, disease, or mechanistic target. Instead, it may be defined by a synthetic data-generating process (SDGP), that is, a constructed gate based on thresholds or consensus criteria, much like the original formulation of SIRS. The potential for SDGP gated RCT is therefore unlimited but the output does not generate buildable knowledge as is the case with 39 years of sepsis RCT and the tragedy of false RCT transport of the ARDS meta analysis to guidelines for early ventilator treatment of severe COVID pneumonia.

When S is synthetic in this way, it can aggregate distinct causal systems under a single enrollment criterion. The resulting trial estimand,

E [Y1 - Y0 |S = 1 ],

remains internally well-defined, but its meaning is conditioned on a gate that does not correspond to a coherent biologic data-generating process.

This gives rise to a third-layer estimand: one that is mathematically precise yet dependent on the structure of S rather than anchored to a stable causal target. As a consequence, transportability may be systematically unsafe because the estimand reflects the composition induced by the selection rule, not a reproducible underlying mechanism.

The prevention of this pathological third layer is the responsibility of the statistician as they understand RCT structure and the clinical trialist may not. This is why cSM or other means for the statisticians to formally dissect S and define the rules for an acceptable S is required. Figure 1 shows the Third layer edit and is from a preprint under review.,

llynn · April 27, 2026, 12:47am

Stephen I wanted to follow up with you to discuss the difference between HTE and the gate induced loss of causal coherence now that I have completed the third paper in my analysis of RCT structure.

If you note the above discussion for Elias, HTE exists asafunction of a covariate vector within the second layer estimand and here it is manageable.

At the third layer HTE induced by a covariate vector of each included causal system becomes a function of a “covariate mass”. In such a mass, a covariate may be a severity indication in one disease, normal compensation in another and irrelevant in a third, all in one trial.

So here you see the comparison DAG at estimand Layer 2 vs Layer 3. (This is from the linked article above). Yet many of these trials where S is synthetic can include 10 or more diseases (parallel causal systems) each with their own parallel covariate vector.

Questions like yours were the reason I formulated the third estimand proof. Teaching the difference between a covariate vector and a covariate mass is quite difficult without the formula layers.

A 3rd Layer estimand is not tied to a biological DGP, it is tied to the gate and can reverse polarity when the disease mix changes with the next trial despite the same selection criteria and in the same hospital.

Let me know your thoughts.

llynn · May 4, 2026, 11:48am

Stephen maybe this will help explain the difference between HTE caused by covariate vector and the effect of RCT gate generated causal system mixing which I call “Synthetic Heterogeneity”.

This phenomenon is fundamentally:

Not → Second Layer (Classic Fisher-Hill RCT)
(Variation of treatment effect within a disease)
→ τᵢ(xᵢ)
Instead → Third Layer (RCT with synthetic gate)
The generation of (and variation in) the causal system mix (which diseases are present)
→ πᵢ(S=1)

This causes the “cause mixture paradox” (Below from the above reference) and the unsafe transport of RCT to clinical guidelines we have observed for decades in critical care where this third layer is generated. Is the difference clear?

This mis interpretation of Layer 3 for Layer 2 is what caused the RCT (meta-analysis) derived ventilator guideline disaster during the COVID pandemic. Those guidelines have been abandoned but the cause of the RCT failure was not investigated. Trialists now have to step their cognitive understanding of trial structure up to prevent similar patient harm in the future.

llynn · May 7, 2026, 2:43pm

Stephen this pic, which is part of my campaign to reduce the decades old RCT guideline associated patient harm in critical care explains the graphs you asked about.

https://x.com/PatientStormDoc/status/2052391103349846381?s=20

llynn · May 13, 2026, 9:45pm

Stephen is this clear now?. It is confusing because classic HTE is within a single causal space. It is derived from the “second estimand” as a function of tau (x). This is a natural part of a RCT. We all understand that. As clinicians we consider outlier status in decision making.

The problem not understood (or not acknowledged) is the heterogeneity caused by mixing diseases or causes at the gate (as by using a symbolic gate, eg a syndrome) which is not specific for a cause or disease. This is the cause agnostic RCT (CAR) institutionalized about 1 year after I began my critical care practice. Unlike a true RCT, they are not safely transportable.

The term “cause agnostic” refers to the cause of the outcome targeted by the treatment. One could use the term “Disease Agnostic RCT”.

This modified RCT was never contemplated by Bradford Hill and generates an additional layer of heterogeneity which is not present in standard RCTs prior to 1987. (Although there may be earlier usages).

This image demonstrates cause or disease mixture heterogeneity and the cause mixture paradox which caused the COVID ventilator revolt in 2020. Here we can mathematically link clinical failure during the pandemic directly to a benign appearing modification of RCT methodology in 1987. No one will debate that.

Here is a teaching image which I use to teach clinicians. This is a cause (disease) agnostic RCT of apples and oranges triaged by the selection gate of “round fruit”. This is analogous to a selection gate of the syndromes ARDS, Sepsis, or CAP based on triage thresholds.

Note again this is not really an RCT in the true sense which requires a targetable cause in all the participants. (For Hill this was alveolar TB). This is a RCT mimic, a streamlined modified RCT which recruits (selects) patients by disease and causal mechanism agnostic triage making a high n easy but destroying safe transportability.

llynn · May 16, 2026, 1:16pm

I realize I never answered this point. The statement that this is a RCT DAG makes treatment assignment by randomization. This step in the graph is omitted as shorthand. X here is the disease agnostic gate, a threshold of nonspecific severity values. So X, besides being a selection gate, is actually also a covariate for each of the different diseases it selects which is why it points to S and outcome.

There have been about 400 RCT of sepsis just studying immune modulation treatment! These have used this pathological structure since 1987. All failed or were reversed after guidelines were implemented.

https://doi.org/10.3390/life15101517

Note the author of this article presents SIRS (sepsis 1and 2 ) as a failure but leaves out SOFA (his own guessed threshold set used for sepsis 3) which has failed for 11 years.

The new approach is “enrichment” , this may reduce the dilution problem but the extent to which it accomplishes that is unpredictable without prior time series OS and cSM/SCM modeling. The potential for the “cause mixture paradox” and harmful transport remains. Accepting the truth (not hopeful enrichment alone) about the pathological methodology is required for real reform.

This is why I introduced “causal symbolic modeling” (cSM) which is basically causal explication of the symbols used as the gate, etc. It asks the question “what is the target of the treatment?”. If the answer is “sepsis” (a symbolic gate) then cSM says “is sepsis a valid gate”? What disease or mechanism is being targeted? How is it assured that all the participants have that target?

Now if cSM (or its equivalent) is not provided then the target may be a synthetic data generating process (SDGP). In an example, a set of consensus thresholds for “round fruit” is a synthetic data generating process. Each mix of “round fruit” will be unique and the RCT is testing E3 (the effect of treatment applied the mix under test).

So this is easy to understand compared to much of statistics. Why don’t trialists embrace gate explication?

The RCT is the ultimate grant generator. You don’t need a research facility. Anyone with a grant can do them. The desire to keep gates open at the discretion of the trialists is strong because RCT of SDGP are unlimited. Grants can flow. 400 RCT can be funded for one aspect of treatment of “sepsis” using the consensus thresholds of SIRS and SOFA for 36 years (both SDGP) with no gate questions asked by the statistician. Billions can be spent on consensus meetings, trials, guidelines, compliance, reversals…rinse and repeat for decades.

In other words a cause agnostic RCT with a "symbol–mechanism substitution fallacy" is the ultimate tool for generating grants. It is 21st century trialists snake oil. The statisticians are, unknowingly, simply along for the ride.

No one dares question my math because even questioning it would open the box the trialists have created for themselves which functions as the easy 21st century research replacement for difficult discovery. I’m not saying they are knowingly doing this. We were all indoctrinated in the technique. It’s the cargo cult in which all critical care physicians were trained. The planes never landed for our mentors despite 400 cause agnostic RCT (for just one type of treatment) but we keep the runway fires burning hoping they will come someday enriching with more fires. In the meantime, it’s a well funded life and like the study of Blondlot’s Z-rays, it commands respect because despite decades of harmful reversals, including during the pandemic, it all looks like amazingly complex science.