When Hill published the first successful RCT in 1948 based on the teachings of Fisher, he spent considerable time presenting the need to test a treatment for a single disease.
Pulmonary TB progresses from the early alveolar stage to a chronic cavitary disease. The chronic disease lacks visible evidence of successful treatment. So Hill required all to have upper lobe alveolar disease which could be assessed to determine the treatment response.
Fast forward ~20 years to 1967 and Petty introduced the idea of performing RCT on a lumped set if different diseases and then, in 25 years, after a RCT technique taught by Bone, a consensus group introduced the idea of defining Petty’s idea of lumped set of different diseases by a set of consensus thresholds of nonspecific lab and vitals values. The consensus group thereby established a method of generating a single disease equivalent for RCT which they called a “heterogenous syndrome”. RCT testing a treatment for these syndromes were the most common embodiments of this novel Petty/Bone RCT.
The “Petty/Bone RCT” became the standard RCT of critical care for the past 32 years. This method of lumping by threshold set (criteria) made case finding easy. Large RCT in critical care have abounded.
However I have predicted (including in this forum) that mathematically these must fail and indeed, a study this year found the 94% of single center RCT in critical care were not reproducible and that false RCT results have caused harmful guidelines which required reversal.
Yet it’s not just the single center Petty/Bone RCT which fail. The most famous consensus guidelines (ARDSnet) derived from multi center Petty/Bone RCT failed during the pandemic suggesting that all Petty/Bone RCT do not provide reliable results.
Is the Petty/Bone RCT a valid RCT? This is a pivotal question for the health of the world but no one will answer the question. Certainly there is only silence on twitter.
I have wondered why no one will openly discuss this topic at the fundamental level of the validity of the Petty/Bone RCT. Is the RCT itself too sacrosanct for critical analysis of its widely used derivation? Is the driving force of expediency as a function of grant acquisition and the desire to be part of team RCT too great?
Does anyone have thoughts about the RCT technique of Petty & Bone?
This is the first time I have heard of Petty/Bone RCT and one thing that strikes me immediately as a physician as the definition of disease per se should stringent. We do have emerging concepts of trials driven by molecular profile instead of the traditional anatomical site but these are few and far in between and mostly for highly specific targeted treatments. However even in traditional oncology randomized trials we will have patients with the same “type” of cancer which actually are very heterogeneous in anatomical extent (called as “stage”) and volume. If we consider the genetic / epigenetic makeup then the heterogenity is even more magnified. Would be following this thread to see what comments come from the experts in trial design in this area.
In critical care the problem definition may be different prompting investigators to consider a heterogenous syndrome. For example I imagine these trials would have patients with sepsis (defined by a set of lab parameters) requiring critical care without actually accounting for the underlying disease process (for example a patient with diabetic ketoacidosis and sepsis may require a very different managment from a patient with a hematological malignancy getting bone marrow transplant and developing sepsis). As a clinician I see several problems with this approach but perhaps most importantly that the focus shifts from managing the disease to managing the lab values.
The following is just one of many papers that condemns the use of dichotomization of continuous variables. Nothing good can come from it, as it loses large amounts of information in the best case, to ability to manipulate results at worst.
Dichotomization leads us to overlook the true nature of the relationship between X and Y. According to [6] , “simply dichotomizing continuous variables without previously referring to the original distributions by plotting them and checking consequences of dichotomization is a bad idea and should be discouraged” (p. 3). These two examples show how dichotomization can lead scholars to wrong inferences.
Maybe an oncology analogy could drive home the main obstacles to successful clinical trials involving critical care “syndromes.” Apologies in advance if what I’ve written below doesn’t make any sense- feel free to ignore it in that case Should probably just keep these kinds of thoughts to myself, math equations being very far in my rearview mirror…
In oncology, it seems like a trial’s ability (A) to detect the efficacy of a therapy, when that therapy has intrinsic efficacy (i.e., the trial’s “assay sensitivity”), might be DIRECTLY proportional to the proportion of enrolled patients whose tumours are biologically capable of responding (R) to the treatment in question AND the differential intrinsic efficacy of the new treatment (as compared with the standard of care treatment), and INVERSELY proportional to the (untreated) prognostic variability (V) of patients enrolled in the trial.
A ⍺ RD/V
Consider the likelihood that an RCT will detect the intrinsic therapeutic efficacy of either a new chemotherapy drug (a “nonspecific” treatment) or a new targeted drug (a “specific” treatment), assuming that intrinsic efficacy of the new therapy IS present, given various key trial inclusion criteria. The comparator treatment in these trials will be the “standard of care” chemotherapy.
The table below shows the likelihood that the RCT will be able to detect the intrinsic efficacy of the new therapy, as a function of the types of patients it enrols.
RCT design: New cytotoxic chemotherapy vs “Standard of care” chemotherapy
RCT design: “Targeted” Drug vs “Standard of care” chemotherapy
Trial Inclusion Criterion
“Adenocarcinoma- all comers”(e.g., breast, lung, colon, pancreas)
High V- extreme prognostic heterogeneity (for example, colon adenoCA and pancreatic adenoCA, once detected, might have very different prognoses). Moderate R- chemo effects are plausibly broad enough to benefit patients with differing tumour biology. Efficacy signal for the new chemo drug will likely be obscured by excessive prognostic heterogeneity among enrolled patients (trial will be too “noisy”) UNLESS the new chemo drug is HIGHLY efficacious as compared with the standard of care chemo (i.e., high D).
High V-extreme prognostic heterogeneity. R will PROBABLY be low, as specific driver mutation will likely be present in only a small subset of enrolled patients. Even if the targeted drug has high intrinsic efficacy, the efficacy signal will very likely be obscured by excessive prognostic heterogeneity among enrolled patients (trial will be too “noisy”) UNLESS the driver mutation in question is relatively COMMON among enrolled patients AND the targeted drug is highly efficacious as compared with standard of care chemo (high D).
AdenoCA, all tumours from same organ (e.g., lung)
“It depends.” Moderate V- prognoses of patients with adenoCA involving the same organ might be less variable (?) Moderate R- chemo effects are plausibly broad enough to benefit patients with differing tumour biology. Trial MIGHT detect efficacy of new chemo drug IF the new chemo drug is at least MODERATELY efficacious as compared with he standard of care chemo (i.e., at least moderate D).
“It depends.” Moderate V- less prognostic heterogeneity among patients with tumours involving the same organ. R depends on prevalence of driver mutation among enrolled patients. Trial MIGHT detect efficacy of new chemo drug, but strength of the efficacy signal will be proportional to prevalence of relevant driver mutation among enrolled patients AND the target drug’s intrinsic efficacy.
AdenoCA, any organ, specific driver mutation present
High V, moderate R. Trial UNlikely to detect efficacy of new chemo drug. Chemo effects are plausibly broad enough to benefit patients with differing tumour biology, but efficacy signal will likely be obscured by excessive prognostic heterogeneity among enrolled patients (trial will be too “noisy”), UNLESS the new chemo drug is HIGHLY efficacious as compared with the standard of care chemo (i.e., high D).
“It depends.” High V, high R. Design and track record for “basket” trials (?)
AdenoCA, all tumours from same organ (e.g., lung), specific driver mutation present
“It depends.” Moderate V, moderate R. Trial MIGHT detect efficacy of new chemo drug. Chemo effects are plausibly broad enough to benefit patients with differing tumour biology. Prognoses of patients with adenoCA involving the same organ might be less variable (?) Trial MIGHT detect efficacy of new chemo drug IF the new chemo drug is at least MODERATELY efficacious as compared with he standard of care chemo (i.e., at least moderate D).
RCT design with BEST chance of detecting the intrinsic therapeutic efficacy of the targeted drug, assuming intrinsic efficacy is present. Lowest V, high R. Prognostic and pathophysiologic heterogeneity are minimized. Therapeutic specificity maximized. This trial design should generate results with the least “noise.”
In critical care trials, V is often high or unknown (patients with very different “root diseases” are often lumped together in sepsis trials); R is often/?usually unknown (biologic “drivers” of these syndromes haven’t been elucidated very well and might differ between patients); D will often be low if the therapy being tested affects only one of MANY possible prognosis-determining causal pathways. These limitations would seem to pose significant challenges for RCT design in this field…(?)
Yes. The Petty/Bone RCT is a term used for a unique pathological modification of the RCT which dominates critical care science. Two USA pulmonologists, Thomas Petty (1960s) and Roger Bone (1980s) developed the idea. The PB RCT applies a research shortcut which allows easy case finding and the use of a single RCT to test a treatment on a variable mix of different (but similar appearing) diseases called “heterogenous syndromes” (ARDS, Sepsis, AKI).
This PB RCT shortcut has been the standard research technique in the US critical care science for ~35 years. The PB RCT was exported decades ago and all countries were intellectually colonized to believe this is a valid derivative of the RCT.
In a PB RCT case finding is rendered easy by the use of threshold sets which are used to quickly triage the cases for inclusion. This is very easy as there is no target disease to diagnose whatsoever. The triage technique for inclusion in a PB RCT deploys a set of non-disease specific consensus set of thresholds which are generally best guessed by a consensus group (called a “task force”) and amended (re-guessed) by a consensus group every decade. This is the standard method of research in critical care. This would be an interesting social “cargo cult” it it were not so dangerous for the public.
In a sense this goes beyond pathological science. It is a form of assembly line case finding for “research” with the trappings of Fisher and Hill strapped on later. So the PB RCT is an RCT façade. The PB RCT is a self-perpetuating form of pathological science as described by Langmuir. Therefore, as with all pathological sciences, of course PB RCT are mostly negative but the few that are positive are not reproducible. This unrecognized pathological science has been the standardized “RCT” technique in critical care science with each new generation being taught the technique.
Statisticians were not told that the heterogenous syndrome under test was connected only by triage and only by case finding set of thresholds, and not by any common biological driver target. So statisticians thought they were doing Fisher/Hill RCT and that the “heterogenous syndrome” was a disease equivalent. In a sense the PB RCT is a pseudo-RCT of synthetic syndromes that are 1960-80s social constructs. An idea of a simpler time when we all thought similar appearing diseases must have a common driver. Now its too late to expose without breaching the veil of political expediency. As we all know, but are not allowed to say, “The spice must flow” (Dune 1)
No one will discuss the PB RCT because virtually all of the critical care research is based on these synthetic syndromes and PB RCT methodology. In the US the PB RCT is not subject to deep introspection so no one debates the validity of PB RCT, even on this forum.
You will never see an advocate of PB RCT openly discuss the method anywhere. For those who understand its presence it is a USA critical care science secret. The rest of the indoctrinated have no idea. Yet, after witnessing 35 years of failure we are trying to shine light on this pathological science. We know this is almost impossible to believe.
Yes dichotomization is a common pitfall and the Petty/Bone RCT has that problem but the pathology of the PB RCT is much deeper than that. PB RCT are RCT applied to a variable set of different diseases which are lumped together by pathological consensus. The diseases have different drivers. The mix of diseases changes with each new RCT so the the entire set is a moving target. So dichotomization, as a function of the entry criteria, is only a small part of the problem.
In the most dramatic example, a specific standardized Ventilator treatment was tested by PB RCT on ARDS (as a lumped set) and found to reduce mortality so this was applied, as evidenced based, to COVID pneumonia (which met the standard PB RCT criteria for ARDS). This treatment failed causing much loss. This shows that the unwillingness to deeply and publicly debate critical care research methodology and openly introspect, has real adverse consequences for the public health at the bedside.
So, the pathology of the PB RCT is magnified by the dichotomization problem but beyond that the PB RCT is a unique type of pathological science which is not subject to analysis as a function of expediency.
Yes. Excellent. Also R varies with the mix of different disease captured by the consensus threshold criteria with each new PB RCT. So mathematically, reproducibility should not be expected.
This is not a Fisher/Hill RCT. Its really not an RCT in the sense that Hill described it so no one should wonder why they all fail. Its not based on a valid function, so it is not real science
Despite this, the validity of the PB RCT is the one subject which no one will discuss (except in this forum) which is why almost no one knows about the PB RCT. Everyone inside critical care science knows that that the state of critical care research is in severe crisis but the idea that the fundamental methodology, in which all (including myself, in the 1980s) were indoctrinated, might be pathological is simply too much for the experts to bear so the PB RCT are simply funded and performed and are not subject to scientific self correction.
We have many allies and are hoping ex-US scientists and statisticians will lead the USA out of this harmful abyss. We call ourselves the critical care rebels. We seek open debate to let the young decide for themselves.
We blame no one. We were all indoctrinated and who is not guilty of failing to timely pull up a well barnacled anchor. Yet we don’t want the youth to be indoctrinated as we were. Exposure to open and candid antidogmatic debate is their right as scientists. No one has the right to capture their minds, even unknowingly. Reform will follow when debate is allowed. Yet the debate gate is closed, the young brilliant horses are trapped in the “dogma stall”.
As a test, try to convince one or more of your brightest, most pugnacious, and debate favoring critical care expert scientist here to debate on this thread now. They will decline. Silence is the only defense they have.
I am going to get a little more rebellious here in this post calling for reform of critical care science.
Let’s discuss the findings in this study that only 6% of the positive critical care single center RCT were reproducible. The authors state:
“ Our systematic review found 19 sRCTs with a statistically significant mortality decrease in critically ill adult patients. Most of these were followed by at least one subsequent mRCT. Survival benefits observed in sRCTs were rarely corroborated by mRCTs, with most mRCTs reporting neutral results on mortality, and one mRCT finding a significant mortality increase with intensive glucose control. Treatment recommendations based on the initial citation of sRCTs with survival benefits were included in international guidelines and typically remained unchanged for a decade before any revisions were made based on subsequent relevant mRCTs.”
THAT’S PROFOUND!
THE MANY WRONG RCTs GENERATE MANY WRONG GUIDELINES often “unchanged for a decade”. WOW, imagine wrong (potentially dangerous) drugs lasting that long.
This does not even count the wrong RCT based guidelines for ventilator care and settings which were abandoned due to excess deaths by bedside physicians during the critical care revolt of 2020.
Yet, no one says a word.
No one asks why! No one will debate why? No one questions the PettyBone RCT methodology.
I predicted this and at first I thought they just don’t understand. Now it’s there in their face and still they do not call for PettyBone RCT methodology review and reform.
How is this lack of action explained ?
Well at least, how about the authors investigating why? No. Instead,
the conclusion of the authors is, apparently, that single center PettyBone RCT should continue for “hypothesis generating”.
Subjecting human subjects to the risk of an RCT to generate a hypothesis seems extreme since the hypothesis certainly already exists.
What hypothesis can be generated by RCT with 6% reproducibility?
They selected trials which were published in top journals so they had adequate number of subjects .
Why don’t they work?
I’m going to play devil’s advocate here a bit- not because I’m sure that I’m thinking about all this correctly (I’m probably not…), but because I’m wondering if there might be another way to interpret this lack of reproducibility.
You’re suggesting that failure of multicentre critical care trials to “corroborate” the efficacy suggested by single centre trials must mean that the single centre trials provided “misleading” results. Maybe this is true in other fields, with well-defined diseases and more specific therapeutic interventions. But isn’t it possible that the opposite is actually true for critical care trials, given that the degree of “noise,” in this particular field, might be a more important determinant than intrinsic therapeutic efficacy of a critical care trial’s “positivity” (?)
A trial’s ability (A) to detect any therapeutic efficacy that IS actually present will depend on the “signal to noise” ratio. We can increase “A” either by increasing signal (e.g., by enrolling more patients and observing more outcomes of interest) OR by decreasing noise. In critical care, one way to decrease noise is to optimize the uniformity of supportive care. In many fields, supportive care quality will be fairly homogeneous from centre to centre and will be less of a determinant of the patient’s final outcome. This is not true for critical care trials, where there are many ways for patients to die (including innumerable versions of suboptimal day-to-day supportive care). In critical care, supportive care heterogeneity (noise) will likely increase in proportion to signal intensity. Multicentre trials will accrue more outcomes of interest, but at the expense of a significant increase in supportive care heterogeneity between centres.
Given these considerations, isn’t it possible that the “noise reduction” conferred by restricting a critical care trial to a single site might actually improve assay sensitivity to a greater extent than will increasing “signal” by accruing more patients/observing more outcomes of interest (?)
Excellent point. This is the opposite of the typical view, that the multicenter trial is more reliable. Of course if a therapy only works in one center that might also mean it’s works with only one team.
Firstly a RCT is a mathematical function and one cannot explain away an invalid mathematical function with words of a theory. This is common in the discussion sections. The math of PBRCT does not work.
Secondly, if heterogeneity is the problem, which of course it is, then the PB RCT is artificially inserting variable, uncontrolled and unnecessary heterogeneity by including a variable mix of different diseases which changes with each RCT.
Yes these Petty/Bone multi center RCT are also not valid and the COVID revolt over failed ARDS treatment suggests that at least some of them are dangerous They have the same problem as the single center
If you see the Petty Bone RCT for what it is, this is research generator with no disease.
I should add to this insightful discussion that the PB RCT problem is as pervasive as can be in critical care research.
Dr Lynn described this mistake following ARDS and sepsis definitions which are certainly the precursors, but in 2024 the lumping mistake is everywhere in critical care.
You don’t have to search for obscure small journals. Let’s take a few 2024 examples from the JAMA.
TRAIN Trial.
They mixed traumatic brain injury, subarachnoid hemorrhage and intracerebral hemorrhage.
DEFENDER Trial
This is the most extreme. They lumped patients with several different condition, including sepsis, ARDS, shock of any cause, acute kidney failure of any cause, mechanical ventilation of any cause.
There are probably more. I think the point is made. These are only 2024 JAMA papers.
These trials are irreproducible and won’t provide any useful information because they were made under the PB RCT paradigm.
Yes, but we cannot even get experts here for gentle discussion. I have invited so many critical care scientists, statisticians, SCCM, ERS, etc. No one wants to discuss the Petty/Bone RCT.
I wanted to respond to this after more thought. Of course, all PB RCT (including multicenter PB RCT) are invalid by design but It is possible that the fact that only 6% are reproducible in multicenter retesting is actually also due to the unique pathology of the PB RCT design.
A disease is the same everywhere in the world but a set of non disease specific thresholds will capture different mixes of diseases in different environments.
The mix of different diseases captured by the consensus threshold criteria likely expands when it is multicenter. For example, when physicians in Zambia tried to use the guidelines derived from PB RCT for sepsis the mortality increased, likely due to new diseases in the mix for which the treatment had a adverse treatment effect.
Consider a synthetic syndrome of the “adolescent sore throat”. (we will call it “Septic Throat”). The criteria we make up is a WBC of at least 12 and an erythematous painful throat of less than 5 days in duration.
An RCT testing the average treatment effect (ATE) of oral penicillin for “Septic Throat” would depend on the mix of Epstein Barr Virus (infectious mononucleosis), Group A Streptococcus (GAS) , Influenza, Neisseria , and other pathogens. Each new RCT would render different results because the disease mix (the pathogen mix) would change.
A single center PB RCT might be positive by chance because the mix of diseases might contain a higher percentage of Group A streptococcus which responds to PCN. When the trial is repeated multicenter the percentage of GA streptococcus cases may be diluted due to a more diverse environment and population.
Here, seeing the only 6% (one in 16) of the single center PB RCT were reproducible in the multicenter PB RCT it is reasonable to derive a theory for that (as you did) rather than concluding as the authors did, the the multicenter PB RCT comprise the definitive arbiter.
Generally it is perceived that larger RCT are more valid. In the PB RCT they are both invalid but it is more likely that the small PB RCT will be positive by chance becasue of less dilution of a fortuitously responsive disease in the mix. Therefore the larger PB RCT is almost always going to be show no benefit as a function of dilution. This is exactly what happened.
The point is that the treatment effect of PCN on a GA strep infection is similar all over the world but the treatment effect of sepsis (which is not a disease) is not. So an RCT to determine the treatment effect of PCN on a GA strep infection is valid a PB RCT to determine the treatment effect of anything on a synthetic syndrome defined by a set of non disease specific criteria is not valid.
I remember back in the early 2000s when I argued that simple counting of 10 second events could not work to determine the severity of sleep apnea and they all ran away. Just as here they would not discuss it. I was asked to leave the email forum of the Anesthesia Sleep Society (a Society of which I was a founding member) because I argued that counting does not work in the hospital and the duration of the apneas must be considered. Opposing the 10 second breath pause counting dogma was unacceptable. They would not deviate from their gold standard of simple counting threshold 10+ second breath pauses (the AHI). They still have not even in the hospital where long apneas can be deadly.
Fast forward to the present and patients die in the hospital commonly due to sleep apnea and opioids. Just as I warned them ~20 years ago, counting 10 second breath pauses cannot detect the risk group. Who knows who is at risk? No one.
Also 20 years later when the science failed to show benefit from CPAP they all were so surprised but it did not change anything. They still use the 1970s simple guessed 10+ second counting as their standard severity indicator so profound cases with severe cycling oxygen desaturation are diluted with the mild sleepy cases with a lot of baby apneas. Its too silly to imagine. They are printing money with useless research.
The same is true with the PB RCT as the standard in critical care and I have over a decade of fighting this silliness. At least there the critical care scientists are indoctrinated with a more advanced and complicated error than the sleep apnea scientists. It is a little harder to understand the pitfall of the PB RCT than the futility of counting 10 second breath pauses to detect the risk of a sleep apnea death due to opioid’s.
Still, it is absolutely no surprise to me that critical care scientists will not come here to defend their science. Regardless of whether or not their lack of understanding is volitional they have an obligation as scientists to at least their mentees and the public to debate their fundamental dogma.
The mentees and the public pay the price for this protective silence, but the mentors reap the reward of unmitigated respect because no one even suspects that the dogma they teach so elegantly from the lectern is not valid science.
The mix of different diseases captured by the consensus threshold criteria likely expands when it is multicenter.
Yes. Variable disease mix between centres would confer heterogeneity/noise, as would between-centre variability of supportive care quality (the example I used above).
An RCT testing the average treatment effect (ATE) of oral penicillin for “Septic Throat” would depend on the mix of Epstein Barr Virus (infectious mononucleosis), Group A Streptococcus (GAS) , Influenza, Neisseria, and other pathogens. Each new RCT would render different results because the disease mix (the pathogen mix) would change…
A single center PB RCT might be positive by chance because the mix of diseases might contain a higher percentage of Group A streptococcus which responds to PCN.
The phrase “positive by chance” will confuse some people. They will think that you’re saying that the finding from the single centre trial is not a “true” result.
It’s important to clarify that you’re actually using the word “chance” here in a very different way. Specifically, I think you mean to say that if a high proportion of the patients that we have recruited for our single centre RCT of “septic throat” happen, “by chance,” to have group A strep as their causative organism, then our trial will stand a good chance of identifying the efficacy of Penicillin (with regard to the incidence of acute and delayed complications and household transmission, for example). In this case, our single centre trial has successfully and correctly identified the efficacy that is intrinsic to Penicillin for treating this infection. In contrast, if we were to include multiple centres in our trial, keeping an overly-broad inclusion criterion like “adolescent sore throat” (a criterion that is agnostic to causative organism), then Group A strep might not end up constituting such a high proportion of all enrolled cases. In this situation, the efficacy “signal” for Penicillin might not be detected since it might be “drowned out” by the biological inability of most patients’ causative organism to “respond” to Penicillin.
Your “adolescent sore throat syndrome" example highlights the importance of ensuring at least some minimal degree of “match” between the mechanism of action of the therapy being trialled and the pathophysiology of the condition being treated. When designing an RCT, we want to ensure that as many patients as possible whom we enrol are biologically capable of responding to the therapy being tested. If therapy/disease mismatch is too marked, we won’t be able to detect the therapeutic efficacy signals generated by the small subset of patients who ARE capable of responding. This would be like testing a targeted cancer therapy in a trial that had enrolled patients with all types of adenocarcinoma, agnostic to the presence/absence of driver mutation (see the table in post #4 above).
Generally it is perceived that larger RCT are more valid. In the PB RCT they are both invalid but it is more likely that the small PB RCT will be positive by chance because of less dilution of a fortuitously responsive disease in the mix.
I don’t think I’d use the term “invalid” here. The imaginary single centre RCT of Penicillin, which has, fortuitously, enrolled a lot of patients whose sore throats “happen” to be due to Group A strep, has correctly detected the intrinsic efficacy of Penicillin for treating Group A strep.
Thanks for your clarifications, which I agree with. This does not change the important point of the failure of PB RCT that caused the critical care ARDS revolution of 2020 (which almost no one knows about) and plagues critical care science causing a 6% reproducibility rate.
Relevant the term invalid, I do not think we should use euphemisms. The fact that an invalid function can fortuitously generate a correct result is not a valid test. It could just as easily have produced a negative result with a different mix and no one would know why.
The essence of valid RCT, mathematics, and science in general is reproducibility. That which is not inherently reproducible as a function of design is not science and is invalid (in my use of the term).
Read this. They have now remerged COVID pneumonia with ARDS apparently abandoning non-COVID ARDS.
How do you look at this as a statistician to decide if you will do the statistics for the PI. First you look to see if there is a valid reproducible measurement of the disease. That’s required for a Fisher/Hill RCT. Here you see there is none. Only changing broad non-disease specific criteria (threshold sets).
Don’t be fooled by the propaganda. The standard ARDS protocol did not work well during COViD pneumonia which caused the 2020 ARDS revolt.
Think about doing an RCT on this nonspecific, broadly defined, lump of different diseases.
They even describe “classic ARDS” whatever that is. The entire discussion is one massive hedge clinging to a 1967 idea.
This is the origin of the Petty/Bone RCT which destroyed the statistics of critical care science.
To explain further, if you decide to test a treatment for the 1967 synthetic syndrome of Petty called “ARDS”, using an RCT with either the 2012 “Berlin Definition” or the 2023 “Consensus Global Definition” , you are doing a Petty/Bone RCT.
Look at these non specific measurements below. They capture a broad set of very different diseases. So these measurements are just a triage for disease capture for RCT, there is no disease to diagnose whatsoever. The goal is to do the RCT.
With this example, it should be clear what a “Petty/Bone RCT” is and why this modification of the Fisher/Hill RCT have not worked for ~50 years.
Here, paired with these broad consensus criteria (guessed threshold sets), you see why the figure below explains the fatal flaw of the Petty/Bone RCT. Unless statisticians teach the PIs that this is not a valid Fisher /Hill RCT, they will keep doing these as they have since the 1970s.
Remember the 6% reproducibility rate for critical care RCT referenced earlier in this thread? That is easy to understand and in fact predictable, when you understand what a Petty/Bone RCT is.
Have courage. Let us end the Petty/Bone RCT this year!
In an example. Imagine we were awarded a 300K grant to use the latest 2023 ARDS criteria to test the effect of “Drug X” on the mortality of ARDS wherein the drug is given within the first 24 hours of case finding.
So lets plan the standard critical care “Petty Bone RCT” with our local expert statistician to test the effect of drug X on mortality of the synthetic syndrome of ARDS using the above threshold criteria.
(Note: the statistician thinks “ARDS” is a “disease” or “disease equivalent” and that it is diagnosed by specific criteria like other disease. So we are doing a Petty/Bone RCT but the statistician thinks she is constructing a Fisher /Hill RCT, )
With the non-disease specific triage criteria (above) we capture 90 ARDS cases. Of course ARDS is not a disease but rather a lumped set of diseases. 90 cases was our target number of subjects determined by the statistician.
(We lump all of them together under the name of ARDS to derive sufficient number of subjects.) We then do the Petty/Bone RCT. It’s flu season and COVID is around. The diseases captured are:
10 cases of Influenza A pneumonia
(infects airways and alveolar cells)
5 cases Aspiration pneumonia
(acid injury of airways and alveolar cells)
25 cases of Covid pneumonia
(infects Type II alveolar cells and endothelia cells)
5 Cases of severe burn associated lung injury
(non specific injury)
5 cases of pancreatitis associated lung injury
(?inflammatory mediators and capillary injury then alveolar filling)
5 cases a lung injury due to major trauma with hypovolemic shock and transfusion
(?inflammatory mediators and capillary injury then alveolar filling )
35 cases of lung injury due to systemic infection with shock
(? inflammatory mediators and capillary injury then alveolar filling)
Total 90 cases of “ARDS”.
Results of Petty/Bone RCT—
The average treatment effect of Drug X is not significant.
What did we learn? Nothing, It is a Petty Bone RCT.
Yet this lumping Petty/Bone RCT is and has been the standard in critical care for ~50 years. No one can stop them.
Positive single center critical care RCT which use mortality as an endpoint have a 6% reproducibility rate. Now you see why that’s not surprising. When the mix of different disease above changes with each new Petty/Bone RCT, then, of course, the average treatment effect changes.
There is an unknown, uncontrolled, and unmeasured flux of the percentage of each disease which comprise the mix of diseases of ARDS across multiple RCT.
That is the hallmark of the extant standard Petty/Bone RCT for testing the treatment of synthetic syndromes. The mistake made by the statistician was that she failed to determine the validity and reproducibility of the measurement of the the perceived “disease” (which actually was a lumped set of very different diseases) before approving the statistical math and method of the RCT. The mistake made by the PI was that she ignored the admonitions about the invalidity of the Petty/Bone RCT and thereby exposed the 90 human subjects to risk for no benefit to them or society and wasted the time of the team and the grant money of the public.