The Petty/Bone RCT

I periodically provide a window into critical care research which is generally (but not always) based on the PettyBone RCT paradigm. This is a retrospective study of the synthetic syndrome of sepsis.

Here a lumped set of many different diseases captured by a prior diagnosis of sepsis is then separated into clusters.

They define five clusters of sepsis

1 Cluster respiratory
2 Cluster severe
3 Cluster liver
4 Cluster old
5 Cluster fit

The conclusion is that “cluster severe” and “cluster respiratory” benefit from Thiamine (vitamin B1). They also find that “cluster old” is harmed by vitamin C plus thiamine.

Thiamine has long been given in this population by some critical care docs. I have given it for decades based on potential physiologic mechanisms in severe critical disease because thiamine deficiency is often cryptic in this population and is well reported.

This is clearly a PettyBone study but can anyone explain the methods here?

I read the methods multiple times and cannot understand exactly how this was done & interpreted.

There is a real and emerging risk of a severe influenza pandemic this year.

In response a member of the lay public, clearly informed but afraid that a new flu pandemic may be coming asked me if “steroids” are effective in severe influenza with “cytokine storm”.

This encapsulates my answer.

*“This is a very good question which has been considered for decades. No one knows because the RCTs of corticosteroids for influenza have not been done. *

*Many observational studies have suggested that corticosteroids may lead to worse outcomes in influenza pneumonia by suppressing the immune response to the virus increasing shedding and load and increasing the rate of secondary infections but these studies are not definitive. *

***The point of my post was to note that critical care science should have done these RCTs so we have the answer but instead they performed easier RCT using the PettyBoneRCT method where they do not need to diagnose a disease but rather simply apply triage thresholds to capture a lumped set of different diseases meeting the threshold criteria for the RCT of the synthetic syndrome of “ARDS”.

*The grants flow to the easier PettyBone RCT even though the rate of reproducibility of single center critical care RCT is only 6%. This is a waste of money away from the real RCTs that need to be done to answer the question like yours. *

*I am sorry we have not done these studies to answer your question but the science is trapped in the PettyBone paradigm…

So in summary we owe you and the public an apology for failing to do these RCT. I am embarrassed to say that we don’t know the answer and this fixation on the easy PettyBone RCT is why we cannot answer your question, even though your question has been well known to be pivotal for public health for decades.

I think this is a really interesting thread. I’ve not seen the vernacular of “petty and bone” used before, but I get the idea. I was first introduced to this concept in general via this paper from Hallie Prescott https://pmc.ncbi.nlm.nih.gov/articles/PMC5003218/

My view, is that the pragmatic large RCT approach in sepsis hasn’t been that successful overall. There are plenty of prominent researchers in this space who agree (Merv Singer, JL Vincent etc.). We’ve had myriad studies that have had neutral end points now. My lab has often discussed this issue, and our prevailing view is that the heterogeneity found within sepsis is too high for therapeutics to be realistically discovered by this approach. The net result is a profound reduction in statistical power for these studies, so their size becomes somewhat misleading. If there is a study of 10,000 patients, but only 5% were ever going to benefit from the therapy, that becomes a problem. Not in the least because it causes us to close the door on potential therapeutics since they “don’t work”.

Contrast this with the rapid therapeutic discovery in COVID pneumonitis (a singular biological disease with a fairly typical onset time and trajectory). Also in the lab, where we standardise “septic” inoculation in rats (often cecal ligation and puncture, or faecal slurry injection).

By contrast, there are universal aspects of healthcare that may be of benefit to study in this way. For example, fluids, oxygen etc. Some therapeutics are so ubiquitous, that I’m genuinely interested to see if they are harmful or beneficial at a group level to all-comers. Mega-ROX is about as big as it gets, and I think the answers there will be informative, but that’s largely because I care about how oxygen interacts with all humans, and not just a specific disease.

A major research priority in sepsis has been the discovery of so called “endotypes”. I’m all in favour of this, but I actually think we are trying to find novel disease features before we have the basics right. Surely it would be more beneficial to start with rapid micrbiological assays (like point of care PCR, that can give me the probable organism on patient presentation). Currently I get blood cultures back after 48 hours, and so up to then I am largely relying on empiric therapies.

I would like to see the next 10-15 years of sepsis research characterised by more specific RCTs that are identifying a more specific patient group.

Apologies if slightly rambling. I haven’t finished my morning coffee yet!

3 Likes

This is the most intuitive solution to the sepsis problem I have seen and I cannot for the life of me understand why this cannot be accomplished with technology we’ve had for a decade.

1 Like

Searching for endotypes to split what was unappropriated lumped is a fatal mistake. We should phenotype infected patients before they meet the Sepsis-3 triage criteria.
Please take a look.

Thanks all for the excellent input. There is no more important contemporary topic in critical care science than the one discussed in this thread.

The ATS reference cited by @Doc_Ed shows that elite critical care scientists falsely think that lumping, (while conceding that it potentially adds heterogeneity), is discretionary. The experts do not appear to be
aware that the Bradford Hill RCT cannot be applied to a “lumping derived synthetic syndrome” where there is no common target. So they fail to warn the statisticians that these syndromes are not disease equivalents so the RCT are not reproducible.

IMO the statisticians would have realized that a Bradford Hill RCT cannot be applied to a set of many different diseases defined by a set of triage thresholds if the true nature and variability of the disease mix from one RCT to the next was disclosed to the statisticians.

The PettyBone RCT is clearly not a Bradford Hill RCT, but is rather a different type of RCT which is NOT applied to a single specific disease or to a specific pathophysiologic mechanism as required for the Bradford Hill RCT. Statisticians must determine if the PettyBoneRCT are fundamentally reproducible. The PettyBone RCT is a shortcut derived from 20th century critical care science. There seems to be no evidence that they are reproducible.

The PettyBone RCT has two layers of heterogeneity for randomization. A first layer comprising the heterogeneity of the mix of diseases captured by the triage thresholds and a second layer comprising the basic heterogeneity of the patients themselves. Imagine trying to pre specify adjustments given these two layers when the mix of different diseases, which will be captured by the triage set of thresholds , is unknown and varies from one RCT to the next.

Reliable Randomization and balance for Bradford Hill RCT is hard enough as evidenced by @ESMD parallel thread.

The fact that these are actually NOT Bradford Hill RCT provides an explanation for the 6% reproducibility of critical care RCT. So critical care scientists must be taught that there are two types of RCT performed in critical care.

For example, a Bradford Hill RCT was applied in the RECOVERY trial testing corticosteroid treatment of the real disease, severe COVID pneumonia.

In contrast, a massive number of conflicting PettyBone RCT have been applied testing corticosteroid treatment of Petty’s guessed synthetic syndrome called ARDS which is not a real disease or a disease equivalent.

The fact that critical care scientists did not know the difference between these two types of RCT and the need to perform real Bradford Hill RCT explains why no substantive Bradford Hill RCT have been done testing corticosteroid treatment for severe influenza pneumonia. Instead influenza pneumonia has simply been lumped into the mix of different diseases for the PettyBone RCT of ARDS because it meets the consensus triage threshold criteria.

**Here is the catastrophic consequence of substituting PettyBone RCT for Bradford Hill RCT. **
Facing an influenza pandemic the world has no idea whether corticosteroids would save lives or cause harm in severe influenza pneumonia.

As the cited paper shows, the scientists think they have discretion to lump different diseases for RCT using triage thresholds. In other words they think they have discretion to test treatments using PettyBone RCT. Therefore, we need a position statement from statisticians to provide guidance about the PettyBone RCT.

As noted, some leaders have called for ending the RCT in critical care given the lack of reproducibility.

image

However. they are really asking for the end of the PettyBone RCT but call to eliminate all RCT because they simply do not know that there are two types of RCT.

Anyone can wind up in the ICU so getting these statistical methodologies right is pivotal. We see the problem but only the statisticians can provide the guidelines for the RCT required to reform critical care science.

The fact that in 2025 we do not know if corticosteroids should be given to save lives during the present wave of severe Influenza pneumonia shows how dysfunctional critical care science has become with it’s penchant for lumping and synthetic syndromes.

The world needs those guidelines ASAP.

1 Like

The statisticians I have spoken with do understand that sepsis is a highly heterogeneous syndrome and that even within that, the attributable mortality to sepsis itself isn’t 100%. I think there is a general appetite to develop the research machinery needed to recruit enough patients to have adequately powered RCTs looking at specific diseases. REMAP CAP is a good example, as they are now recruiting into non-pandemic influenza. So I think the corticosteroid answers are on the horizon.

A bigger problem with regards to steroids in flu is the abundance of poor quality retrospective research that has generated advice against steroids in acute viral pneumonias, despite the overt confounders .

I’m hopeful of change being near. But I don’t think it will come as some grand revolution. Small steps and incremental change.

1 Like

Hi @Doc_Ed and @Lawrence_Lynn.

I think change is long overdue. I mean it’s time to abandon the ARDS and sepsis constructs because they are based on a failed conjecture - the Petty and Bone lumping error. We are on top of 6 decades of failed research in ARDS and 3 decades of failure in sepsis. It is not about understanding. It seems everyone understands whats is going on but people find it difficult to stand against the tide.

I will tell you an indoctrination story. My first contact with critical care was in 1996 as a Med student in a small ICU of a community hospital in a small Brazilian town. Bone’s sepsis and ARDS were not usually mentioned, because almost no one in our ICU spoke English. Anyway, at the time it was hard to access foreign medical information. Then I decided to became an intensivist and moved to my state capital, where I “entered” the world of critical care for the first time. It was around the turn of the century. I learned to lump different diseases for a generic treatment. In fact, I was indoctrinated to it. That was a matter of belonging to a group. Professing belief that ARDS and Sepsis are real nosologic entities is your way in.

There is a dynamic that mitigates any advance in our field: the fact that a few people determine what is fundable, researchable, and publishable. Those are the ones who write the ARDS and sepsis definitions and make people accept that you can, for instance, diagnose a disease by the prognosis using qSOFA, without any biological reasoning. This is not Medicine nor Science.

Yet, after decades of normalizing such extravagant ideas, the entire American and European research enterprise is contaminated. People acknowledge the mistake and keep silent because they need to publish. They sold their minds for this dogma. It is even worse in the corners of the world where people don’t speak English. Some locals partake with Americans and Europeans in the phony research industry. They promulgate the dogma to the non-English speaking locals, those professing their faith to feel they belong to the group. Expect no innovation from this state of affairs anywhere in the world.

Decades have passed. Only the older are aware of what happened and what was argued against the absurd ideas of Petty and Bone. Back in the small town ICU, 30 years ago, the doctors were not indoctrinated. They knew “sepsis” patients were dying of infection and treated infection and gave supportive care as needed. The same we do today after 30 years of wasteful research. However, they were not indoctrinated to pretend there is a disease named sepsis that is different from infection. I am an admirer of them.

Today, I can see the difference between the indoctrinated and those free to think with their minds. And I also see there is no way to change if young American intensivists don’t change. It is time to let go of the Petty and Bone RCT.

I apologize for the long and a bit emotional text. I am getting older and maybe it makes me think we should hurry to save our specialty.

I have written about it on Substack. Please check it out.

1 Like

Exactly, so we must investigate why the critical care research community and the statisticians (if they knew the extent of the PettyBoneRCT) failed the 13 year old girl in this article, and all the rest of patients with severe influenza pneumonia in the last 4 decades who died or nearly died from severe influenza pneumonia.

image

As @Doc_Ed points out there are no substantive Bradford Hill RCT to guide the physicians in these cases (relevant steroid treatment), despite the fact that this very inexpensive generic drug is the treatment most likely to work. That these RCT are coming is not mitigating. It’s 2025! Where have they been?

The lack of these RCT is almost impossible to believe! How did that happen?

The reason the pivotal Bradford Hill RCT were not done is absolutely surprising, but to find this answer requires a deep dive into the history of critical care science.

This history review reveals that in 1975 Thomas Petty included “viral pneumonia” in his lumped set of ARDS. For this reason, for the next 50 years, based on the words of this esteemed thought leader, influenza pneumonia was simply lumped with -20 other diseases in PettyBoneRCT testing corticosteroids for ARDS. These “steroid for ARDS studies” date back to the 1990s and have, as expected, rendered results all over the map.

So the treatment of influenza pneumonia with steroids was not tested as a single disease by a valid Bradford Hill RCT because the leaders were indoctrinated to substitute and trust the PettyBone RCT for ARDS by Thomas Petty!

For this simple reason, in 2025, with a potential pandemic looming, the bedside physician does not know whether steroids save lives or cause harm in severe influenza pneumonia.

It was once said, in the vernacular, that these critical care leaders are “dug in like an Alabama tick”.

That seems to be true. Therefore, defunding of the PettyBone RCT by the new NIH and requiring a shift to Bradford Hill RCT is probably the only solution. Given the present state, this actually might not be as difficult as one might expect.

I brought this discussion so many years ago to this special place of mathematical excellence because; “Mathematic excellence is irrelevant if the fundamental function is invalid”.

I guess we are past the time for talk. It is time for action.

1 Like

**Statistical Embellishment of Langmuir Pathological Science and the PettyBone RCT

Here is a review of examples of the many PettyBoneRCT testing steroids for ARDS of course with ubiquitous meta analysis and Cochrane blessing showing they were all intellectually colonized by US critical care science and indoctrinated in the pathological PettyBone Methodology.

So this is a beautiful example of statistician and gate keeper (institutional) embellishment of pathological science. The public pays when everybody participates in pathological science.

image

image

1 Like

In summary critical care science has a deep methodological failure problem. So WE must do root cause, failure mode analysis. (There is no backup coming).

So we turn to study the method and 70 year history to detect any deviations.

Here we see Hill in 1948 presents the rationale and important considerations for designing and conducting a medical RCT following the teachings of the polymath genius, Fisher.

Reading this paper is very useful as Hill explains his “single disease —-testing a single treatment” parameters.

We see that critical care in the late 20th century deviated from the teachings of Hill creating a new type of RCT.

So first the determination the type of RCT should be now added as part of any review of a critical care RCT.

1 Bradford Hill RCT
Applied to a Single disease —testing a Single treatment.

Bradford Hill Bundled RCT
A Bradford Hill RCT—-testing a bundled set of treatments

2 PettyBone RCT
Applied to a lumped set of different diseases by non disease specific triage thresholds —-testing a Single treatment

PettyBone Bundled RCT
A PettyBone RCT—-testing a bundled set of treatments

1 Like

Here is clinical background for this RCT failure and linked editorial discussion.

Briefly, the ARDS one size fits all ventilator protocol derived from a massive PettyBone RCT failed during the COVID pandemic.

This caused the critical care physcians in the trenches to revolt rejecting the ARDS protocol.

The scientists then sought to find what caused the failure and, holding to their PettyBone dogma, concluded it was a the triage threshold chosen to lump the different diseases for PettyBone RCT rather than realizing it was the PettyBone RCT itself which caused the failure.

So this is an amazing editorial as you see them try to argue this was a simple mistake, an anomaly, rather than a counter instance indicative of pathological science based statistical methodology.

Here you can see how dogma defines the direction of science. This is a must read for those interested in learning how anchor bias defines methodology and perpetuates pathological science.

So this editorial provides a near perfect example of dogma perpetuation from the opening line lauding the origin of the dogma and its creators to the finding of a flaw which they argue explains the failure far distal to the origin. Finding the perceived “flaw” they see no need to go deeper. In fact it is obvious that they were always going to find a flaw distal the true apical pathology in the RCT methodology.

After dutifully citing the 1967 Petty paper teaching the PettyBone method of lumping many different diseases together. The authors of the editorial argue that they have determined that the gold standard threshold for the ratio based on an blood oxygen (PaO2) to the fraction of oxygen administered (the FIO2) which is called the “P/F” was not a valid triage threshold to capture the different diseases for the PettyBone RCT. So they argue that the use of this threshold measure, which they mandated under quality guidelines, was ”the mistake”.

Of course only someone who did not understand pathophysiology and the Bradford Hill method would think they could lump profoundly different diseases by a threshold ratio of P/F. This is a multifactorial and volatile lab value signal.

Yet dogma reduces the functional intellect. Rational thinking gives way to indoctrination and trust in the mentors teachings.

This acknowledgment of such a major mistake might seem like a huge step but it is not. In fact, it perpetuates their pathological science by finding a false cause distal to the fundamental pathology.

So instead of accepting or even acknowledging the possibility that the Petty’s and Bones lumping shortcut (of using triage thresholds to capture and lump different diseases for RCT) is pathological science, they decide to stay with the PettyBone shortcut method and now propose that they first lump different diseases into sub phenotypes by ML/AI and then do PettyBoneRCT on the sub phenotypes.

Note that the most important fact about the PettyBone RCT is that it is disease agnostic. It is an RCT applied to a 20th century Synthetic Syndrome (in this case ARDS, an idea of Petty) which is now construed by those holding the PettyBone dogma, as a disease equivalent for RCT.

So in this article they never discuss performance of the Bradford Hill RCT because they never mention a single disease, much less the diagnosis of any disease.

They do not talk about diagnosis of any specific disease because it is well known that COVID pneumonia (a specific disease) precipitated the recognition that the PettyBone lumping science which produced the ARDS one size fits all protocols (based on P/F) were not valid and failed during the pandemic.

So of course something has to be blamed and they have chosen their old standard threshold of P/F as the fall girl, the down stream measurement culprit.

What else can they do. Generating a bad measurement as a guideline for RCT and initiation of mechanical is bad enough and I suspect they feel a courageous catharsis has occurred. However accepting that the fundamental PettyBone methodology was the cause of the failure. This is impossible for the indoctrinated to even consider, much less comprehend.

So they acknowledge they made a “little” mistake but they are really perhaps unknowingly arguing that the mistake was not fundamental, not apical. Rather they argue the mistake was something anyone could make, (using the wrong measurement).

Specifically they argue the simple mistake was the use of a threshold of gas exchange not the ground breaking mistake of using the PettyBone shortcut itself.

Here we see NO root cause analysis. The citation of Ashbaugh (Petty) sets the stage. They don’t question Petty’s idea as a potential cause of the failure but rather laud his idea and then move forward to find the mistake was simply the choice of lumping diseases by the P/F threshold as the cause.

Having discovered this simple mistake they perceive the solution is easy. All they need to do is find other measurements for lumping diseases for RCT under the PettyBone RCT methodology. So they turn to ML/AI for this task. The RCTs can flow forever with ML generated triage capture of the different diseases lumped for PettyBone RCT.

This is perfect for increasing the number of RCT since the bottleneck of diagnosing a specific disease under Bradford Hill is bypassed and replaced by simple triage measures produced by ML.

Testing the next drug for ARDS will remain as easy as it has been for everyone with new standard disease lumping measurements . Not a single disease diagnosis is required. Of course all of these RCT failed but they think they have found the reason.

All of this, rather than accepting they need to do Bradford Hill RCT on each disease separately (such as influenza A) perhaps later dividing the disease into sub phenotypes.

Yet they can’t even mention the diagnosis of a disease because it violates their PettyBone Synthetic Syndrome dogma.

Note this editorial once again teaches away from the Bradford Hill method of testing a treatment using RCT for a specific disease.

They need help from statisticians.

1 Like

There are perfectly valid reasons to conduct large pragmatic trials of syndromic conditions, particularly where the biology isn’t fully understood at a mechanistic level.

I’ll talk with respect to sepsis (not ARDS) because that’s more directly in my domain of expertise academically speaking. Sepsis is a heterogenous syndrome. A very heterogeneous syndrome. A very very heterogeneous syndrome. But, we share the world with abundant micro-organisms who have co-evolved with us. Some have evolved to become “predators” as Lawrence puts it (I actually really like this and will be using in future teaching with attribution if that’s ok) while others are more out to try and co-exist, but can’t help but cause damage along the way (a lot of which is “friendly fire” from our own immune system anyway). Some pathways in this process and extremely well conserved. LPS for example. The mechanistic pathways of LPS are so well conserved that injecting sterile LPS will cause multi-organ failure and “sepsis” without even needing bacteria to be present. So there is a rationale for trying to study even a heterogeneous syndrome like sepsis as a single group. The question is, where do you draw the line in terms of experimental design, and where do you try to tackle any heterogeneity of treatment effect through analysis, accepting that the latter will inherently have lower power.

Current sepsis RCTs recruit patients that look like this: “All comers to the ICU with shock and proven or presumed infection”. It’s highly pragmatic, but if you were to find a signal, it would be incredibly broad in its generalisability. You can also do a big study fast. Great. Only this includes 30 year old men with pneumonia caused by pneumococcus, 90 year old women with a UTI and shock from e.coli and everything else in between. My view is that this is just too heterogenous a group, and my evidence for this has been the rather overt inability to discover effective treatments in this field over the past 20 years. We have tried immunomodulation of specific and well conserved pathways that tend to be activated by all bacteria (IL-x etc.) but generally this hasn’t worked. Probably because we either encounter patients too late and they would almost need to be used in a prophylactic context, or because the pathways are just more complicated than we presently understand and more lab/bench work is needed.

So could we go narrower? Sure. I think an optimal RCT in sepsis might look like this “All comers to the ICU with shock thought secondary to proven gram negative bacteria”. This is a little more refined. You can still do a large RCT and the results would still be broadly applicable. Patients in this group will still have a variety of different organ systems as their point of inoculation, but we are designing the study against a particular host-organism interaction (principally LPS in this case). There will still be heterogeneity. E. coli is not the same as Klebsiella pneumoniae as an example. They do cause clinically distinct presentations. However, there is still quite a lot of conservation in the host interaction. Even with this arrangement, I have concerns. Pseudomonas is quite a different animal with a lot of intrinsic resistance factors. What about organisms with inducible AmpC. The list could go on.

Should we go narrower still? I don’t think so, at least not for now. What would this look like? Recruit: “Male patients, over 65 years old, who are previously independent, with proven E coli infection from a urinary source, in shock, but not DIC, in the UK”… etc etc. Where do you draw the line? With any infectious ailment, the clinical manifestation is an interaction between the host (a highly complex organism in its own right) and the microorganism. The variability in human hosts is vast. This is not a lab with all male Wistar rats of the same weight and age. Further, by demanding such a strict entry criteria, it will be very challenging to recruit any patients. And by the time we recruit a single patient, they will probably be so late on in their clinical pathway, that our proposed treatment will have long since missed its chance to work. Could you do it with a massive global research network? Maybe, but the results will have such limited generalisability that I almost think it would be a worthless endeavour.

All this Petty/Bone chat misses the point. It isn’t a binary state between a “good” trial and a “bad” trial based on who we include in the trial. Everything sits on a continuum and we need to decide where is optimal. I fully accept that the current approach for lots of things is too pragmatic and too broad. But that doesn’t mean the solution is so straightforward as you suggest it is. It’s still incredibly difficult to know where to stop slicing.

There are also treatments where the host interaction is likely to be very well conserved and so it’s reasonable to do a large and inclusive trial. The mechanisms for how humans handle and respond to oxygen are effectively universal. ICU-ROX and Mega-ROX are good examples (who incidentally both have pre-defined “sepsis” subgroups, as they are expecting there to be some heterogeneity of treatment effect).

This isn’t some global scientific conspiracy. The thought that there could be such a thing when most people in a single department don’t even agree on a single trial interpretation does make me laugh a little! I’m a little wary when people tell me they’ve stumbled upon something that a global scientific community has missed, because honestly, we haven’t. I can’t tell you how many times my lab group has had the syndrome/heterogenity conversation in sepsis/ARDS/critical care etc… It’s incidentally also a theme of my PhD thesis (i.e. how we should best address it).

But maybe these ideas need publishing in the mainstream journals again if you are concerned that we aren’t talking about it enough? If there is appetite from a journal editor I’d happily write on the issue.

4 Likes

Dr.Palmer- a couple of questions:

  • In your view, what proportion of critical care trials published in the last 10 years have had inclusion criteria that are too broad?

  • If you feel that overly broad lists of trial inclusion criteria are still commonplace, what factors do you feel drive this practice?

Apologies for asking questions way outside my area of expertise, but I’m not sure I understand why you feel such an endeavour would be worthless (?)

Do you think it would be feasible to design an international trial to test the relative value of variably granular methods of subdividing patients for critical care RCTs? For example, maybe infected patients could be stratified (?the right word), at the point of admission, into one of several possible groups. For example:

  • “Infected” (culture positive);
  • “Infected; gram negative rods”;
  • “Infected; gram positive cocci”
  • Etc…

Following this stratification, patients within each group would be treated with antibiotics which seem (at least on initial presentation) appropriate to their particular infection (antibiotic choice would subsequently be tailored to the specific organism’s sensitivities when culture results became available), but would ALSO be randomized, within their respective arms, to additional treatment with either placebo or other non-antibiotic treatments that hold promise (e.g., IL-x, others…), in the event that they were to deteriorate clinically (to be defined…) (?)

In order to optimize power, outcomes of interest would also need to be defined as granularly as possible.

Ideally, patient stratification would include even more granular patient subdivisions (e.g., “Infected; gram negative; E.Coli”). But power would obviously be lower with each increasingly granular subdivision and this might be, logistically speaking, a bridge too far. This approach would also preclude application of non-antibiotic randomized therapies at the time of admission, given (?insurmountable) delays in identifying the specific organism.

Thanks for your interesting post !

2 Likes

I don’t necessarily think any of them were too broad. I think it was reasonable to try this approach. But they haven’t been very successful in the discovery of effective treatments, and so I think a different approach is now justified. It all depends on how conserved you think the treatment pathway is across the syndrome group. Some treatments are absolutely ubiquitous and are targeting a common pathway. For example, dopamine vs. noradrenaline. This is the kind of treatment that for me has much higher validity when applied to the whole group of sepsis; since hypotension and shock are common end points of sepsis and they are likely mediated through similar biochemical pathways regardless of the underlying organism (this isn’t actually strictly true, but its true enough here for the sake of making the point). We treat shock all clinically the same (with vasopressors/inotropes). Something that targets a more specific part of the inflammatory cascade, like steroids, is probably less suitable to study across the whole group in my view. I think the authors in that field know this to be true, as they’ve tried various ways over the years to try and isolate the right group with varying degrees of success (those with relative steroid deficiency being the classic approach).

I suspect in general it’s a desire to have a “large” trial (with the perceived high power) with enrolment completing in a reasonable timeframe and findings that can be broadly generalised across our patients. I mean, who wouldn’t want that!

Power calculations are all a bit silly in our field anyway. The number of times I’ve seen a trial “powered” for a 15% ARR in mortality (which would be basically unheard of in critical care) when they in reality should be looking for 2-3%, but couldn’t realistically secure enough funding to support a trial of that size. Power calculations in intensive care are somewhat a work of fiction and tend to better reflect the level of funding, rather than disease traits.

Because it would be a huge undertaking with results that don’t generalise to anyone and so largely non-actionable.

Let’s take a very distinct disease and compare. Spinal muscular atrophy (SMA) is a rare and often fatal genetic disease with a well defined clinical presentation and a natural disease timecourse that allows for intervention. Because it has a very specific (and genetic) diagnosis, research infrastructure was able to identify (and translate) disease modifying agents, even though the condition is rare. These therapeutics are now used clinically. So here we have an example of a rare disease, for which an appropriate treatment was developed and tested and now deployed, in part because it was very clear how the results generalise; in short, they don’t generalise at all, they apply to this very specific genetic diagnosis.

Contrast this with sepsis. If I want to identify a specific disease subgroup it isn’t immediately obvious how I should do that. I could start with a single bacteria, but even within a single bacteria there is variation. There are also practical issues here. It often takes 48-72 hours to culture any bacteria from a patient, which is well outside of a reasonable treatment window for most proposed therapeutics in sepsis. Not to mention that about 50% of sepsis is culture negative (i.e. we never grow any bacteria). We could perhaps employ modern point-of-care PCR techniques for faster results. But my experience is that these often flag positive for a number of different organisms simultaneously, and it isn’t immediately clear if one is a contaminant or not. I probably need to also account for the host response in some way. So let’s just enrol patients with DIC, since that’s quite a specific phenotypic presentation of sepsis. So even if I carve out a specific group of patients to study, based of precise enrolment criteria, I’ve created a trial infrastructure that is going to make it very difficult to implement a timely therapy. Even if we do find a positive result, I still don’t really know how to action that information. I dare not generalise beyond this extremely narrow population of patients. I might see one patient every few years with this cluster of features. We need a certain degree of pragmatism in both the conduct of trials, and the application of their findings.

The larger, more pragmatic and inclusive a trial is, the bigger the requirement is to explain how the treatment effect would be conserved across the group.

I think very reasonable to stratify by groups of micro-organism as a starting point. There are literally millions of potential subdivisions beyond that, and so I can’t see it as practical as you’d never randomise anyone. If effect you’d just be sorting them by the microbiology.

Agreed. I would like to see more ordinal outcome measures used in intensive care: with death on one end, and survival with normal function on the other.

I’ll close by highlighting a study where I think they got the balance right. CAPE-COD Investigated steroids in community acquired pneumonia, with clinical and radiological evidence of a pneumonia and an oxygen requirement. In practice this turned out to be mostly pneumococcus and so picked out a pretty clean signal. We also saw a nice signal for benefit dependent upon your inflammatory profile, so there is excellent biological plausibility.

I would love to see more medium size trials taking this approach.

2 Likes

Thanks for your thoughtful response. Your reference to the CAPE-COD trial is really interesting. When hydrocortisone was given to patients admitted to ICU with severe CAP, but prior to evidence of shock, there was a clear mortality benefit.

Interestingly, a prior study (Meduri et al: https://link.springer.com/article/10.1007/s00134-022-06684-3) had a design very similar to that of CAPE-COD (as far as I can tell). Enrolment was slightly lower and methylprednisolone was used instead of hydrocortisone. In contrast to CAPE COD, this trial, did NOT show a mortality benefit signal.

In the Discussion section of the CAPE-COD trial, the authors proposed possible explanations for the between-trial discrepancy. But they didn’t comment on the fairly marked difference, between the two trials, in the nature of the bacterial isolates. In both trials, pathogens were identified in only about half of patients. But in the CAPE-COD trial, among patients with isolated pathogens, Pneumococcus was by far the most common isolate (see Table S2 in the Supplement: Pneumococcus was isolated in approx. 20-24% of patients; Legionella was isolated in 5 -7% of patients and other pathogens were isolated in approx. 5% of patients or fewer) . This distribution is quite different from the Meduri trial (where no single pathogen dominated, among culture positive patients):

From the Meduri trial:

“The most common pathogens isolated were Staphylococcus aureus (10%), Streptococcus pneumoniae (9%), Pseudomonas aeruginosa (3%), and Escherichia coli (3%). Initial antibiotic treatment was deemed adequate in 96% of the participants based on ATS/IDSA guideline recommendations (Fig. 2).”

Is it plausible that the greater microbiologic homogeneity for patients in the CAPE COD trial might explain why the trial was able to identify a mortality efficacy signal (?) As you know, a mortality efficacy signal was also identified for steroids in the RECOVERY trial, in which all patients were infected by the same pathogen (COVID).

I also wonder, given the extent to which Pneumococcus infections dominated the CAPE-COD trial, about the distribution of bacterial isolates among patients in this trial who died. For example, if all those who died were “culture positive” and infected with Pneumococcus, then this would mean that the mortality efficacy signal was being driven exclusively by this patient subset. Unfortunately, these details aren’t presented in the publication…Without this information, it’s hard to know whether it’s valid to extrapolate the mortality efficacy signal for hydrocortisone to “all comers” diagnosed with severe CAP, regardless of bacterial isolate (?) Maybe the critical care community knows the answer to these questions (?)

Finally, the other feature of these trials that stands out to me is the fact that these were patients with severe respiratory disease, but without evidence of multi organ involvement (yet). Maybe the patients in CAPE-COD were “ideal” in the sense that a large portion were infected with a particularly nasty pathogen (Pneumococcus) that had the potential to cause abrupt multi-organ deterioration [perhaps with higher probability than severe CAP caused by other pathogens (?)], but who were “caught” at a point in their clinical trajectory just before they developed multisystem failure [at which point the ability of steroids to impact outcome (measurably) might have been lost] (?) In other words, maybe the “key” similarities between RECOVERY and CAPE COD (i.e., those factors most essential to their success) were:

  1. Enrolling patients at exactly the right time point in their clinical trajectory- when their condition was severe enough for their prognoses to be meaningfully impacted by steroids but still isolated mainly to one organ system (i.e., prior to development of complications that might have expanded the number of possible causal pathways leading to death and added statistical noise);

  2. Enrolling enough patients with a poor enough “untreated” prognosis that a sufficient number of outcomes of interest (i.e., death) could be captured (increasing the chance that intrinsic therapeutic efficacy, if present, would be revealed); AND

  3. Further reducing statistical “noise” by increasing prognostic homogeneity through enrolment of sufficient numbers of patients infected with the same pathogen (?)

2 Likes

Thank you Edward for your thoughtful input. You elegantly presented the standard argument for the PettyBone lumping paradigm of critical care which has been promulgated for many decades. It great for everyone to see exactly what that argument is. That would be a well received editorial for the journal Critical Care Medicine.

Yet, this is a statistics and data methods forum so we are going to be asking for more specificity then you provided. I hope you will stay and bring that specificity by correcting any perceived errors I have made in the analysis I provide below. You state:

Very well, I know critical care scientists think the term “heterogenous syndrome” is meaningful, but since this is a data methods forum, not a Critical Care Medicine or Blue Journal editorial, we don’t have the luxury of using ambiguous terms of art. For us to understand you, you have to objectively define what you mean by the standard critical care term “heterogenous syndrome”?

The adjective “heterogenous” is ambiguous and the noun “syndrome” is ambiguous. So the phrase “heterogenous syndrome” is, linguistically, the product of two ambiguous words, so in a sense it is “ambiguity squared”. This is not a flippant determination because it is exactly ambiguity squared that critical care scientist ask the statisticians to study using math to render reproducible results.

Here is the answer the to the question; What is a “heterogenous syndrome?”*?

My Answer: ***A heterogenous syndrome is a disease agnostic first mathematical SET of tens or hundreds of different diseases which fall within the scope of a second mathematical set of non-disease specific thresholds.

(This second set of thresholds can be determined as a best guess, either by one person or a consensus defining group, or otherwise the second set of thresholds can be defined by machine learning. This standard second threshold set for all to use worldwide in RCT is ONLY amendable by a select consensus group task force, but these amended set of thresholds need not overlap the prior threshold set . In other words the tasks force need not build on the past but can guess anew. )

So now that we have established that critical care scientists have been testing treatment for “heterogenous syndromes” the past 3 decades, we obviously have no idea what they have actually been testing treatment for. Worse, neither do they.

For this reason arguments relating to the testing of heterogenous syndromes are impossible to follow because it is not possible to study, with math, “ambiguity squared” and render actionable, reproducible outcomes. In fact given that, it is not surprising Edward that you say:

Indeed, under the PettyBone paradigm, RCT broadness is undefinable.

Of course the argument that the world is complex and the work is difficult is not a pass to continue PettyBone RCT. The brilliant Bradford Hill shows how to engage complexity by narrowing the method to test to a single phenotype of a single disease wherein the outcome is measurable.

That, at least some, of critical care science, are now narrowing the target is commendable, but the masses of critical care scientists and statisticians of the world have their marching orders for RCT lumping using SOFA and the ARDS threshold sets. That a few elites are finally recognizing that the PettyBone RCT is pathological science, while still letting the rest of the world flagellate using the standard research PettyBone lumping guidelines, without disclosure that this is pathological science, is not mitigating.

A critical care scientist might argue that the heterogenous syndrome is something that “we just know about”. A sociological construct of sorts. But the composition of each second threshold set which have been promulgated as a world standard to capture each first set of diseases of a heterogenous syndrome prove that not even this is true. SIRS (Bone’s original threshold set which lasted from 1989 to 2015 has no mathematical or biological similarity with either the signals or the thresholds of SOFA, which is presently promulgated as the world standard for research. Furthermore, the present standard ARDS thresholds sets have little similarity to the set guessed in Berlin in 2012 (which failed during the 2020 pandemic).

The argument that one can always slice bread thinner, while true, is not an argument for studying lumped ambiguity squared. Indeed, you make the argument that there is no dichotomy between the PettyBone RCT and the Bradford Hill RCT. I disagree. A sepsis RCT using SIRS or SOFA is so far on the PettyBone end of the spectrum as to be defined as completely opposing Bradford Hill’s teachings. It is the study of a heterogenous syndrome, in other words it is an RCT testing a treatment of ambiguity squared. But we don’t need to ask Fisher or Hill how well the PettyBone RCT works, we have over 3 decades of proven failure.

Certainly I do agree with you there is a spectrum once you move outside of using RCT to test treatments for heterogenous syndromes. The CAPE-COD trial is much more on the Bradford Hill side of the spectrum then the PettyBone sepsis and ARDS trials. Indeed, studying bacterial pneumonia is not diagnostically agnostic. However, as Erin points out, there is still a component of PettyBone lumping which should give any informed practitioner considerable pause at the bedside.

When I started this campaign against lumping in social media in 2012, the “thought leaders” were so far down inside the dogma that they were aggressively defending Bone’s guess of SIRS. They were not even close to understanding that they were doing PettyBone RCT testing treatments of ambiguity squared. With terms of art like “heterogenous syndromes” they fooled themselves and indoctrinated the young. We all did this in the past.

That your team and others of critical care science have now come around to see the folly of the PettyBone dogma is indeed progress, but the editorials and discussion should be very much more forthcoming, admitting the past mistake and teaching the truth to the statisticians who they fooled into thinking that a given heterogenous syndrome was a well thought out “disease equivalent” for a Bradford Hill RCT.

I did not stumble upon this problem as you imply, I did a comprehensive root cause analysis of the failure by exploring the history of critical care science and its dogma to understand its failures. Root cause failure analysis of fundamental dogma (such as the PettyBoneRCT methodology) is not something the young critical care scientists are taught. Also, I was not the first to provide warning. Murray. a master pulmonologist, argued, in an editorial, that Petty’s lumping idea was not sound in 1975.

It is true that some in critical care are now aware they have a problem but they do not examine it in the open by a root cause analysis process. If they did they all would see we have been been practicing pathological science by pathological consensus.

Over a decade ago I once believed the dogma and taught these things. Although I never taught SIRS (Bone was my contemporary), and of course not SOFA as lumping tools for RCT, still, at first, I thought Bone’s methylprednisolone trial was a good idea and meaningful. We were all indoctrinated but now we have to tell the entire truth so that does not happen to the next set of mentees. We have to save the public. The political considerations of critical care science are irrelevant.

To the point of identification of what we need to change, I have found that the standard terms of art, which critical care science thinks are valid, are so ambiguous that any discussion at the level of the mathematical function of the RCT itself is never possible. The discussion and debate and pro con sessions while appearing real remains superficial. I have a term for this. This is “synthetic debate” about “synthetic syndromes”. The statisticians and indeed the debaters themselves are fooled into thinking a heterogenous syndrome is an objective biological entity.

Root cause analysis is not possible as long as “ambiguity squared” is still a valid term of art.

Thank you Edward for your excellent contributions to to this discussion/debate. I hope you will stay and continue the dialog. I hope you will send your colleagues. This PettyBone RCT thread has had over 3000 readers so a complete discussion on all sides is important. We also look forward to their comments.

1 Like

Lawrence

I think Ed’s saying that he and most other practising intensivists get it. If anyone is going to internalize the problems posed by clinical heterogeneity, it will be the people who contend with them day in and day out at the bedside. Ed also acknowledges that the long list of failed trials in critical care science signals a problem.

Your crusade to raise awareness about the pitfalls of syndrome-based RCTs is now many years long. The massive waste of money and time (and patient lives) stemming from ossified suboptimal research methods clearly drives you bonkers- understandably. But transformative change will require credible alternative proposals. To this end, have you ever considered switching tracks from raising awareness to proposing specific research questions that you’d like answered and designs you’d like to see used? As Ed explained so eloquently, these are very challenging clinical problems. If they were simple and everyone knew how best to address them, we wouldn’t be having this discussion…

Possible explanations for why the field hasn’t advanced as quickly as it should:

  • Nobody with access to major funding streams thoroughly understands the statistical implications of lumping patients with disparate pathologies into a single trial;
  • Researchers with access to funding DO understand the pitfalls, but since they can’t figure out how to circumvent them and don’t want to jeopardize career advancement/funding streams or lose face over past failed trials, they just forge ahead with trials they know will be futile, patients-be-damned (I choose to believe that most clinicians are not this cynical…);
  • Some intensivists (?maybe most) DO understand the pitfalls but either don’t perform research or don’t have loud enough voices/big enough names to garner the funding needed to run truly transformative trials;
  • It’s very hard to find effective therapies in medicine, for most conditions, even when trials enrol patients with underlying pathologies/prognoses that are much more homogeneous than is true for critical care RCTs.

I get the sense that you’re waiting for a mass mea culpa from leaders in your field- the ones who have, historically, perpetuated failed designs. But if, as you suggest, these people are hopelessly entrenched (for whatever reasons), then maybe this isn’t a realistic expectation. Instead, maybe you’ll end up seeing slow and steady course correction over a period of several years, as funders start to question why they should continue investing in trial designs that have not, historically, borne fruit. As Ed says, there are signals that at least some research groups are trying to change course (e.g., the pending influenza/steroid trial). And, given the current U.S. research climate, it seems likely that the progress you so desperately want to see will end up coming from countries other than yours…

2 Likes

Thanks Erin. I think we are early in the acquiescence phase of the failed PettyBone paradigm. Ed’s group may be early transitioners but the field is still having task force meetings to determine the next set of thresholds to capture the different diseases for PettyBone RCT

A task force just issued a new worldwide set of thresholds for PettyBone RCT for pediatric sepsis about 6 months ago. Edward’s recognition notwithstanding, there was no mention of the PettyBone pitfall in that consensus statement and the lactate the ML identified to render 2 points indicating the presence of the heterogeneous syndrome of sepsis was “greater than 10.9”. So the teaching of the PettyBone pitfall has not been promulgated and it is not widely known at the world scale.

So A “heterogenous syndrome” is a still disease agnostic first mathematical SET of tens or hundreds of different diseases which fall within the scope of a second mathematical set of non-disease specific thresholds.** just as it was in 1989.

So accepting your sensitivity to the intensity of the debate, signs of impending victory is not an indication to let up.

The desire for acknowledgment of the PettyBone mistake
is not for revenge, it is to relieve the critical care research community from the central control the well meaning but indoctrinated and paternalistic leaders have asserted (and still assert) when they attempt to harmonize (standardize) the world sepsis and ARDS research by controlling the grants to enforce consensus rules like “SEPSIS 3” (SOFA) and the PettyBone RCT paradigm.

As an example, I recall in one of our grant applications, we made the point that; “There is no agreement as to what sepsis is”. The grant reviewer snapped back in her denial stating “There is agreement on what sepsis is and it is SEPSIS 3”. (SOFA). (Quotes are close)

This threshold set that the great reviewer is requiring as the triage set to determine the diseases which are included as sepsis was guessed by one man in 1996 but selected by the task force as the triage set for harmonization of the worlds research of sepsis so she expected that we would use SEPSIS 3 to get the grant which of course almost everyone still does as SEPSIS 3 is still the expected standard measurement today for adult sepsis PettyBone research. So Edward’s group may have changed but the official position of the leaders have not.

That is how tightly the science for most is controlled.

Really, I’m here to disclose the truth to the statisticians about the PettyBone RCT, synthetic syndromes, and fake (guessed) measurements (like SOFA) hoping they will learn the truth about critical care “measurements” and that a heterogeneous syndrome is not a disease equivalent and that these studies are not Bradford Hill RCT.

1 Like

(Editorial comment): This discussion keeps getting better. I especially like seeing more opposing viewpoints offered.

(General statistical comment): Having prognostic homogeneity is not necessarily desireable in RCTs in general. But what we need is homogeneity in treatment effects across patient types.

2 Likes