Cause-Agnostic RCT (CAR)

Lawrence_Lynn · October 31, 2025, 7:12pm

Over on X there is a debate relevant causal inference vs RCT design. But what is an RCT?

Since Bradford Hill’s landmark study we have seen the RCT morph into two species.

So can we no longer speak of or debate RCTs as if they comprise a unified entity… a thing. There are CIRs (causal-integrity RCTs) for example the original Bradford Hill causal design, and CARs (causa-agnostic RCTs).

a CIR conditions on a cause AND a covariate vector.
a CAR conditions one or more thresholds (S=1) and are agnostic to cause.

The RCTs of synthetic syndromes like sepsis and ARDS are CARs.

Slide showing the left side of the DAG of a CIR (top panel) in relation to a CAR with the emphasis that CARs are generally NOT on the causal path so they generally crash. .

CIRs are what we all thought RCTs are. They have morphed into the much easier CAR which use non-disease (and non-cause) specific threshold sets as triage thresholds as the gate making case finding and a high n easy but there is no causal unity so they are internally valid (as a function of randomization) but externally non transportable.

The first thing a clinician should ask is: is this study a CIR of a CAR?

This is not to say that CARs (cause-agnostic RCT) are always invalid. Covariate matching might be valid if the same matching can then be applied to the population external to the CAR but not if conditioned on a threshold with a few perceived covariates as in the classic Petty Bone RCTs, the invalid CARs (RCT mimics) which have devastated the image of the RCT in general in critical care.

This “covariate” matching is potentially accomplishable by computer based subphenotype relational time pattern recognition.

Success of such an advanced covariate based CAR technique is likely capturing the unknown cause or mediator in a sense converting it into a CIR equivalent.

Note that matching “covariate” vectors by prognosis (eg AUC-mortality) cannot accomplish this potential CAR to CIR conversion to equivalency.

So the critical care clinicians have to lead the way because these causal paths and covariate vectors and trajectories are complex.

this is a call for CI and design community “synergy,” This distinction of CIR and CAR complements Pearl’s formal identifiability conditions by applying them to the clinical gate itself and it honors the design focused axiom that; “the integrity of the design determines transportability.”

Yet If the CI school insists that the design world must BEGIN by accepting layered propensity-weighting assumptions as definitive, and the RCT design establishment continues to ignore the categorical difference between a Bradford Hill-style CIR and a modern Petty-Bone CAR, then we will remain trapped in the epistemic interface gap.

If this lack of collaboration persists, it is up to clinicians can break this impasse because trialists have been unable to engage in such antidogmatic discussions. Witness you saw all of them run away here. The present state is creating disciplinary dissonance because it challenges 5 decades of methodological consensus. Indeed this analysis questions the heart of the present and decades old hierarchical critical care task force controlled syndrome science itself.

Of course the intellectual antipathy is palpable between CI and RCT design communities. You can cut it with a knife and it has long been unrelenting. I don’t see any intellectual sympathy coming from either camp. So this time it’s up to the clinicians to take the initial steps to save their patients from the perils of intractably siloed epistemic orthodoxy.

Firstly: by telling the CI community we yet don’t know enough to START with layered assumptions and,

Secondly: by refusing to consume Petty-Bone CAR based research and rejecting the task-force-driven scaffolds upon which CAR “science” rests.

I look forward to further discussions.

Lawrence_Lynn · November 1, 2025, 11:50pm

Notice in the above diagram I’ve provided a ghosted indication of covariates in the CAR diagram. Can anyone see why? If you do, then you already understand what a CAR is. If you don’t, go back to the DAGs and think more deeply before you read below.

———-

Did you get it? Yes, of course the covariates are ghosted because they’re phantoms. They are statistical placeholders without a causal parent. In a cause-agnostic RCT (CAR), there’s no true disease node upstream to anchor them, so their apparent relationships are artifacts of conditioning on the synthetic gate S = 1. They look like covariates, but in causal terms they don’t exist.

Now if you understand this then you can easily see why CAR are derived from an apical error. The apical error was the action of replacing upstream cause with a non-disease and non-cause threshold (S=1).

-Do you agree with the ghosting of the covariates?

-What do you think if the CIR and CAR distinction?Is there any value to a CAR such as hypothesis generation?

-What should be the disclosure to the participants if enrolled in a CAR?

-Is it true that we can’t generate a valid causal equivalent by matching signals to a prognosis alone?

-How would time series matrix components (relational time patterns) be encapsulated as part of a “causal equivalent vector” based design when the cause is unknown?

Here for example we can see that there are relational times series patterns of combined myeloid perturbation and hemostatic perturbation. How would such a relational time pattern be studied as a potential targetable cause equivalent?

Would exploration by CI assisted OS using EMR data be useful to find the cause equivalents for RCT?

Explaining further, suppose the myeloid/ hemostatic vector was identified by CI as a potential targetable cause for aspirin instead of using Sepsis -3 (SOFA 1.0 plus suspicion). This would still be a CAR but it might contain the unknown target and be a “CIR equivalent”. Covariates would then potentially be valid.

All of this research opens up when we publicly abandon the “blind CAR”. (The CAR with an international task force (central control) defined SOFA or P/F as a gate) and we return the design of the research to “the people”. This mean no more intellectual colonization of LMIC based researchers in the doctrine of the task force and promulgation of the error so they can move on quickly.

“The people”, the hard working researchers across the world, are many times more likely to find the “cause equivalents” and to generate better and newer ideas than the anchored task force based in the USA can possibly conceive.

Yet “the people” are presently told to follow the rules and one of those is a new rule published this month. ….SOFA 2.0 a new gate (S=1) for another decade of blind CARs.

Development and Validation of the SOFA-2 Score https://jamanetwork.com/journals/jama/fullarticle/2840822?utm_source=twitter&utm_campaign=content-shareicons&utm_content=article_engagement&utm_medium=social&utm_term=110125

We know then what’s coming next…Sepsis 4 replacing Sepsis 3 (which used SOFA 1.0 as the CAR gate). Sepsis 4 will use SOFA 2.0 for another decade of blind centrally controlled sepsis CARs.

Some will generate phenotypes after the gate of SOFA 2.0 creating blind CAR of “phenotypes” that met S=1 just as was performed with SOFA 1.0.

Musical gates, blind CARs, central control of S=1. Titrate the collider, rinse and repeat for 35 years of blind CARs, let’s fix this for the public.

I can’t stop this unless I can at least precipitate discourse.

Lawrence_Lynn · November 5, 2025, 8:08pm

Illustrative Case: The Pulse Oximetry Trial as a Cause-Agnostic RCT (CAR)

Let us now consider the consequences, both beneficial and detrimental, of a cause-agnostic randomized controlled trial (CAR).

To illustrate, I will reference a landmark perioperative RCT and its subsequent Cochrane review of pulse oximetry in postoperative patients (n = 20,802):

Pedersen T, et al. Pulse oximetry for perioperative monitoring. Cochrane Database Syst Rev. 2014. PMID: 24638894.

The review concluded:

“Routine continuous pulse oximetry monitoring did not reduce transfers to intensive care and did not decrease mortality, and it remains unclear whether any real benefit was derived from the application of this technology for patients recovering from cardiothoracic surgery in a general care area.”

The primary trial underpinning this conclusion was a classic CAR, a randomized design conditioned not on a defined causal pathway but on a generalized postoperative state (S = 1).

The Causal Structure

Consider the simplified DAG in which postoperative patients enter through the gate S = 1, representing the “postoperative” state. The target is the timely detection of early complications which may reduce the number of complications since complications have an apical component which cascade into a plurality of complications and, sometimes, lead to mortality and increase Length of Stay (LOS)

Upstream, multiple distinct complications (causal pathways) feed into that gate:

D₁: Pain-treatment overdosing (iatrogenic respiratory depression)
D₂: Congestive heart failure (CHF)
D₃: Pulmonary embolism (PE)

Each of these conditions interacts differently with pulse oximetry:

For D₁, early detection of hypoventilation may improve survival.
For D₂ and D₃, however, continuous oximetry may paradoxically delay recognition of clinical deterioration. Respiratory alkalosis can transiently elevate oxygen saturation and thus blunt the device’s sensitivity to adverse events.

The delay may be shorter in older patients (PE) and longer in younger patients who can generate more profound respiratory alkalosis; conversely, in CHF, the effect size is weighted toward older age strata.

Here, the covariate “age” appears influential, yet it is not grounded in a causal framework.

Implications for the Estimated Treatment Effect

Within this mixture, early detection in D₁ may be offset by delayed detection in D₂ and D₃. The global average treatment effect (ATE) on survival therefore trends negative, even though within one disease stratum (D₁) the treatment is beneficial.

Such aggregation across unrecognized causal heterogeneity produces an apparent null or harmful result and generates covariates that are statistical artifacts rather than causal entities.

Policy Consequences

This landmark CAR effectively halted adoption of routine pulse oximetry for decades.

Ironically, from a policy perspective, the conclusion was defensible as the technology’s global ATE was near zero.

Yet from a pathophysiological perspective, the lesson was misleading.

Because the design lacked causal grounding, the field learned nothing about the distinct mechanisms by which pulse oximetry could help or harm. The study was, in effect, a “numbers game in the blind.”

The Epistemic Lesson

This is the defining flaw of a blind CAR: it yields no pathophysiological insight.

Complications in critical care evolve dynamically, and salvage depends on recognizing those mechanisms, not merely on aggregate outcomes.

Thus, we can summarize:

The good: The policy outcome, cautious non-adoption was arguably correct for the aggregate population.
The bad: The nuanced causal lessons were lost; incorrect inferences were drawn about the potential value of the technology in specific causal contexts.

A CAR is, therefore, a blind instrument.

There is no justification for remaining blind when causal-symbolic modeling (cSM) can be applied both before trial design, to define valid gating criteria and strata, and after analysis, to interpret heterogeneous effects in the light of causal structure.

In short, cSM opens the eyes of the CAR, transforming blind randomization into informed causal inquiry.

Lawrence_Lynn · November 8, 2025, 5:00am

Keeping with the CAR theme, I am trying to determine the extent of shrinkage in the REMAP-CAP trial and how that might have effected (reduced) the difference in treatment effects across diseases between the corticosteroids in influenza strata and the corticosteroids with other pooled pneumonia types.

IMHO REMAP-CAP platform represents an evolved form of the Cause-Agnostic RCT (CAR) but the Bayesian hierarchical modeling introduces mathematical conflation of biological (causal) mechanistic differences of the means and the variances.

So IMHO statistical exchangeability and biological coherence may be conflated in this RCT design.

In a purely mathematical (non biological) sense, exchangeability means that the outcome distributions of treated and untreated subjects can be considered similar up to random noise.

Hierarchical or Bayesian models exploit this by borrowing strength across strata that appear empirically alike.

But biological coherence requires something deeper, a shared causal mechanism through which the intervention acts. In other words mathematical similarity does not assure the coherence that enables acceptable pooling.

Two strata may be exchangeable in a statistical model yet fundamentally incoherent in biological terms, as when pneumococcal pneumonia and influenza pneumonia are partially pooled within the same “CAP” gate.

Borrowing strength across such incoherent strata narrows uncertainty mathematically while widening the agnostic state causally. The means move together but thus us a mathematical not a biological induced similarity.

The apparent precision of the posterior becomes a false clarity, concealing the epistemic discontinuity between mathematics and mechanism. Perceived noise elimination is actually the narrowing of the true biological force separation of the means.

A valid cause-informed RCT restores alignment by restricting exchangeability to strata that are mechanistically coherent, where biological and statistical similarity coincide.

Its Bayesian hierarchical framework functions as a statistical patch for the structural weakness that defines all CARs. Specifically the absence of a unifying causal anchor within the enrollment gate S=1.

Because the platform randomizes across patients defined by a synthetic syndrome (CAP) rather than a verified disease, the population is causally incoherent.

The hierarchical model shrinks stratum effect estimates toward a common mean but in the presence of primary disease (target) mixtures this may treat biological heterogeneity as if it is statistical noise.

In this design, the hyperpriors on the global mean and the between-stratum variance are apparently enforcing partial exchangeability among non-exchangeable disease states but the gamma is unknown so I could be wrong about that. Please advise.

The shrinkage provides stability: the model smooths random variation across small, dispersed strata to maintain functional adaptive randomization and produce interpretable posteriors.

Yet the smoothing operates only at the statistical level and thus was the intent given the dispersion of the centers which creates noise to be removed. Yet this does not resolve or mitigate the underlying causal fragmentation but rather hides it.

What appears as precision in the posterior is thus a mathematical artifact of enforced similarity, not genuine transportable knowledge. Biological differences are mitigated, the treatment effects appear similar as a partial function of shrinkage alone.

So in causal-symbolic terms, the REMAP-CAP framework is a PettyBone CAR mimic:

it replaces missing causal structure with a mathematically unifying methodology. Borrowing is beneficial within mechanistically coherent strata but not within biologically incoherent mechanisms to generate a single marginal estimand as well as multiple conditional estimands which are pulled together mathematically despite the fact that their raw differences may have a causal (biological) origin which is now mitigated by the math.

One wonders how adaptive randomization would be effected by an early chance exposure to sequential exposure to different mixes, of different diseases, with different treatment effects.

In my paper I don’t go to this depth but rather present the DAGs and leave the discussion of pooling and adaptive randomization, both of which require sufficient biological coherence, to the statisticians. .

I hope there will be clarifications of these musings. I may be wrong on some of these details but not on the fundamental premise which I have researched for my paper.

REMAP-CAP is a major advance and everyone must be in awe (as I was) relevant the cool method applied to bring centers across the world into mathematical, noise reduced, harmony. Yet there may still be some bugs inherented from standard 50 year old critical care RCT methodology. Let’s find them.

Lawrence_Lynn · November 19, 2025, 3:51pm

Frank correct me if I misinterpret because I am learning from you and applying the lessons to the critical care field. Your point about risk magnification contains an implicit critique of marginal estimands and identifies properly performed conditional estimands as potentially superior for some purposes.

In another post there is a discussion of ATE, ATT, ATO, and ATM as often described as targeting different populations.

What your observation makes clear is that the estimands diverge because different groups of patients dominate each weighted average, not because they correspond to distinct biological effects. In other words, they are not distinct causal effects but different weighted averages across heterogeneity, statistical reflections of who is in the sample rather than biologically defined quantities.

ATE, ATT, ATO, and ATM each include different groups of patients in the calculation. So the effect you get depends on who is included, not on one biological disease. So marginal estimates require “averaging of unlikes,” producing sample-bound, fragile estimates. Even with a constant treatment effect, a simple shift in the population captured and therefore covariate distribution (e.g., a younger population) changes the marginal effect, already a concern for disease-specific RCTs (CIRs).

I would add that for cause-agnostic RCTs (CARs), this critique becomes insurmountable.

Take sepsis as a concrete example. “Sepsis” is not a disease; it is a gate (S=1) that aggregates 10–50 distinct mechanisms of acute illness, including:
-pneumococcal pneumonia
-influenza A viral pneumonia
-aspiration pneumonitis
-pancreatitis
-cholangitis
-pyelonephritis
-meningococcal sepsis
-abdominal catastrophe
-fungal bloodstream infection
-post-operative infection
-etc.

Each of these diseases has its own risk structure and its own covariate–outcome relationships.

Now consider just a few commonly used covariates:

Lactate
-Pneumonia → lactate = shock/oxygen debt
-Influenza → lactate = respiratory muscle fatigue
-Pancreatitis → lactate = third-spacing/hypovolemia
-UTI in elderly → lactate = dehydration/frailty
-Meningococcal disease → lactate = fulminant DIC
Same lab value, five different mechanisms.

Age
-Influenza ->age = higher severity (varies with strain)
-Meningococcal sepsis ~> younger age = higher incidence/severity
-Pancreatitis ->middle-aged drinkers = worst outcomes
-Cholangitis ~> elderly = highest risk
One covariate, opposite effects depending on disease.

WBC / CRP
-Pneumonia → infection markers (but dynamic bimodal risk)
-Pancreatitis → sterile inflammation
-Fungal sepsis → WBC often low or nonspecific
-Viral syndromes → WBC may be normal unless secondary superinfection, CRP variable
Same numbers, different meanings.

These covariates have no unified causal interpretation across the disease mixture called “sepsis.” This is a CAR.

And this is where IMO your blog critique becomes decisive but not just as stated but also relevant the CAR.

In a CIR (real disease), conditional estimands can reduce variance and clarify effect heterogeneity. (The blog’s conclusion).

But I would add that this logic is incomplete because the conclusion should be “RCT species dependent”. In a CAR (mixed diseases), conditional estimands are worse than marginal ones (but both are invalid) because they impose a single causal interpretation on covariates that do not share one. Conditioning amplifies the collider bias at S=1 and substantially guarantees internal invalidity.

ATE magnifies one blend of diseases.
ATT magnifies a different blend.
ATO magnifies the overlap between blends.
ATM magnifies the matchable subset of blends.

These are not strata of one disease, they are different mixtures of different diseases.

This is the direct generalization of “averaging of unlikes,” except here we are not averaging over unlike risk strata but over unlike diseases, with incompatible causal pathways. There is no unified risk surface to magnify, no single HTE structure to average across, and no stable meaning for the covariates that generate the weights.

Thus internal validity collapses before external validity is even considered. Randomization cannot rescue an invalid gate; it can only balance whatever happened after flow through the gate. If S=1 does not define a coherent causal system, no marginal estimand is biologically interpretable.

In summary: IMHO the logic of the linked blog is RCT species dependent; sound for CIRs, trials of real diseases. But when extended to CARs, the logic beautifully exposes that thestructure of the design makes it impossible to generate a valid causal estimand, regardless of how well the trial is executed.

Sepsis (as used in RCTs) contains many mechanisms. Therefore, in a CAR, a marginal estimand is not the marginal estimand of anything biological and a conditional estimand is worse.

I can’t make the link to the blog work but the link is provided in this excellent thread.

https://discourse.datamethods.org/t/propensity-score-weights/28536/7

f2harrell · November 19, 2025, 4:54pm

I like the way you worded the top part, and perhaps what follows. Maybe we can simplify the discussion somehow, e.g., cross-classify covariates by known vs. unknown and convey general risk vs. treatment effect being sensitive to the covariates. You are most interested in the latter, and when you are in the (unknown (covariate not measured), treatment sensitive to covariate) part of the 2\times 2 classification we perhaps have the most difficulty.

Lawrence_Lynn · November 19, 2025, 5:08pm

I agree 100%. The unknowns have always been the root of a deep (poorly defined) concern of mine in the old days (a decade ago) even though I really did not understand it with any granularity and in any case had no idea how to articulate it till I read your work. I only understand your work at a general (overview) level not in the operationalizing of it.

I think we can come along way with that approach combining pathophysiology especially with more simple CAR like those testing treatment for CAP. How would that be operationalized?

Lawrence_Lynn · December 12, 2025, 6:03pm

These discussions of RCT always assume there is one species of RCT. We know this is true because the very issue under discussion does not distinguish the RCT types. Interestingly an RCT of hypertension is a hybrid sitting directly between the two extremes.

So there are two types of RCTs, CARs and CIRs represent the endpoints of a continuum defined by the size and character of U. This same continuum also applies to OS.

One way to think about them is that at the extremes a CAR is a “Population RCT” and a CIR is a “Mechanism RCT”

A Cause-Agnostic RCT (CAR) is fundamentally a population-based experiment, not a mechanistic one.

A CAR defines the experimental unit by selecting a population label or a threshold set & treats this enrolled population as if it were a single causal entity (a Bradford Hill disease equivalent).

Thus a CAR is a trial of:

“patients who meet this disease agnostic definition” rather than
“patients who have this specific disease or share this targetable biological mechanism.”

Why does this makes a CAR a “population RCT”?
-The “disease” or mechanism under test is actually whatever patients the definition captures.
-If the population composition shifts, the “disease” shifts.
-The causal architecture of the trial is whatever mixture of U the population happens to contain.

In a CAR, the causal system under test is the enrolled population itself, not any particular mechanistic pathway.

This is why CARs look like they are testing treatments,
but in reality they are testing populations.

CARs are fragile because their effects depend on U, but they can be generalized if, and only if, one can recreate the same U by meticulously matching new populations to the original population. This is generalizing the population, not the mechanism.

In Contrast:

A CIR is a “Mechanism RCT”. (a Bradford Hill RCT)

A Cause-Integrity RCT (CIR) is designed around a real disease or causal mechanism, not a labeled population.

The experimental unit of a CIR:
-specifies the disease D1 or pathophysiology,
-isolates the mechanistic D1 group,
-and restricts randomization only to patients who share that specific disease (mechanism).

Thus a CIR is a trial of:

“the disease or mechanism X responds to treatment T in this way.”

It does not depend on who happens to show up at the hospital that month meeting the broad community acquired pneumonia criteria (S=1). It depends on whether a predefined disease or mechanism is present.

Why does this makes it a mechanism RCT?
-The trial is testing how T interacts with an actual biological process.
-The result is less dependent on population shifts.
-The estimate is a function of the mechanism, not the mixture.

In a CIR, the causal system under test is the mechanism itself (even if not completely understood) so the effect is stable and population-independent.

Now consider how U distinguishes population RCTs from mechanism RCTs

In a CAR (Population RCT):
D1, D2, D3,…, Di —-> S=1

The gate S=1 aggregates multiple causes, which are not modeled, so:
U ={all the collapsed synthetic heterogeneity}

The treatment effect becomes:
ATE-car =f(U)

Thus:
-Change population ->change U
-Change U → change the effect
Therefore the effect is highly population-dependent.

“A CAR is population-dependent because it is highly U-dependent.”

In a CIR (Mechanism RCT):
S=1 selects a single mechanism Di

U shrinks to a small, structured, residual so:
ATE-cir =f(mechanism)

Thus:
Mechanisms do not change across hospitals, seasons, or regions so the effect is much more population-independent. (This is a Bradford Hill RCT.)

A CIR is more population-independent because it is mechanism-dependent. (Streptomycin kills Mycobacterium Tuberculosis regardless of the population).

Because the true disease is often unknown at the bedside, some trials will inevitably be CARs, but clarity only comes when we recognize that a CAR is not a CIR. Distinguishing the two is essential for aligning design theory with causal inference and enabling real scientific progress.

So “generalization of RCT” cannot be scientifically discussed or defined without first considering what species of RCT is under discussion or more specifically where on the size of U spectrum the trial sits. (ie Where did the trial design place the trial on the CIR-CAR spectrum.) For example:

A RCT testing treatment of pneumococcal pneumonia with antibiotic with resolution of the pneumococcal pneumonia as an endpoint is a pure CIR. (A Bradford Hill RCT) U is low. Generalization is excellent with little population dependence.
An RCT testing treatment with corticosteroids for the set of diseases meeting the consensus criteria of community acquired pneumonia with a survival endpoint is a pure CAR. (A Petty-Bone RCT). U is massive. The effect size and polarity is entirely population dependent and the population is a trial artifact which cannot be reproduced. Generalization has great potential to cause harm.
An RCT testing treatment of a well defined population with ideopathic hypertension with antihypertensives with a blood pressure target is a mid level hybrid. U is moderate. Generalization is moderately population dependent but the population is clinically definable.

Lawrence_Lynn · December 12, 2025, 6:05pm

One might ask why RCT species (CAR and CIR) framing is necessary given the much more detailed and extensive review in @Pavlos_Msaouel article https://www.mdpi.com/2072-6694/14/16/3923. The answer is that it highlights that some CARs are irredeemable by design and represent pathological science. (I cited the opposing treatment effects of two large back to back CARs in my earlier post) with more potential to cause harm than benefit.

The second reason is that the DAG and do-calculus which @Pavlos_Msaouel presents requires a substantial learning curve. So clinicians and trialists need to learn WHY they need to learn to dissect RCT design at a deeper level beyond the standard statistical operating characteristics defined by CONSORT.

Lawrence_Lynn · December 12, 2025, 9:40pm

The ultimate generalization is the insertion of the results of an RCT into guidelines. This decision to generalize to guidelines has the potential to improve the heath or harm hundreds of thousands (or millions) of patients. The statistical and design theory factors which determine the decision to include results of a trial as “clinical guideline defining” are poorly defined.

Here is an example to consider for your paper. Two large RCT (CARs) 2 years apart. One generalized and the other ignored.

These two CARs testing the same treatment (hydrocortisone) to the same symbolic gate of Community acquired pneumonia (CAP).

The results:

CAPE COD:. Relative risk of death reduction 47%.

REMAP CAP: Relative risk of death increase 53%

CAPE COD:
“By day 28, death had occurred in 25 of 400 patients (6.2%; 95% confidence interval [CI], 3.9 to 8.6) in the hydrocortisone group and in 47 of 395 patients (11.9%; 95% CI, 8.7 to 15.1) in the placebo group (absolute difference, −5.6 percentage points; 95% CI, −9.6 to −1.7; P=0.006).

REMAP CAP:

“By day 90, 78 (15%) of 521 patients assigned to hydrocortisone and 12 (9.8%) of 122 patients assigned to control had died”

Note: the difference in assignment of treatment vs placebo. This modifies the risk conclusions relevant REMAP CAP.

Now CAPE COD was first (2023) and considered generalizable and set the guidelines. REMAP CAP was second (2025) and was not considered generalizable and was, instead, largely ignored. But if REMAP CAP is right and CAPE COD (which was limited to France) is wrong then the guidelines are presently providing harmful (deadly) care for very many patients. This is important since all humans are vulnerable to death from pneumonia. The number needed to kill (NNK) could easily be 20.

There is reason to believe that hydrocortisone causes benefit by reducing inflammation damage but it could also cause harm because it suppresses the immune system. (For example, Inhaled corticosteroids increase the risk of pneumonia.) So there is a risk of increased secondary infections. So there are reasons why both harm and benefit could occur.

So this is a great real-world comparison for you to make in your paper. There is a pivotal need for a definitive “checklist” evidence basis for generalizability.

Why, based on the hard statistics and design theory is CAPE COD generalizable and REMAP CAP is not?