Now perhaps, after that last paper’s adjustment for SOFA as a covariate in a Pandemic trial, the reason for my prolonged, merciless provocations in this forum is becoming clear. (As a reminder, to my knowledge, there has never been a single reproducibly positive RCT of sepsis.)
As I have shown SOFA is:
1.The standard trial entry criteria for sepsis
2. An independent variable in a trial
3. A trial outcome (primary or secondary)
4. A baseline covariate of a trial
5. Adjusted for, as a covariate, in a trial
That is alot given SOFA was guessed in 1996. However, guessed origin aside, the question now (for those in this forum who have never been shy about advocating adjustment for covariates) is:
Is it appropriate to adjust for SOFA as a covariate?
Given that the possible adjustment variables are measured at the moment of randomization or earlier, the question becomes a relative one in my view. What is the best bang for the buck? For a given sample size (so that we keep overfitting in mind) is there another index or set of variables that can be pre-specified and that will explain more outcome variation than SOFA? We might pause a moment to look at competitors. As I was on the APACHE II and APACHE III development team, I’m not in an unbiased position, but I think that the APACHE III acute physiology score takes more physiologic variables into account and for each physiologic variable has more resolution by using more (only slightly arbitrary) intervals for that variable. Of course APACHE III has been used a lot in sepsis clinical trials as an adjustment variable and as an effect modifier, with nonzero success.
I dont think there was anyone in the past who did not think that was the right thing to do in these critical care trials.
It seems likely now, but only in full retrospect, that these scores were effectively encapsulating hidden heterogeneity and variability into the studies.
Did you ever use “delta APACHE”. Here authors conclude that “delta SOFA” is superior to “fixed day SOFA”.
I have seen discussion here about the problems with using baseline change. Does anyone have thoughts on a “delta score” as an endpoint.
Here, in 2017, the authors discuss why a derivative of SOFA as a RCT endpoint may be needed.
"There are several unresolved issues around the validity of the SOFA score as an endpoint. First, the responsiveness of the SOFA score to intervention-induced change in mortality risk has not been quantified. It is unclear how the SOFA score changes in response to a treatment that changes the mortality risk within a specific timeframe. Second, the consistency of the SOFA score to reflect changes in underlying mortality risk has not been quantified. Even if true mortality-modifying treatments effects are reflected in the SOFA score on average, the validity of the SOFA score as an endpoint is doubtful if this relationship is inconsistent. Third, it is unclear which derivative of the SOFA score is the most appropriate endpoint.
Therefore, the aim of the present study was to quantify the responsiveness and the consistency of different SOFA derivatives to reflect treatment effects on mortality. The results from this study may aid clinical decision makers in the interpretation of trials that use SOFA as an endpoint, and may help investigators choose the most appropriate SOFA derivative in the design of future RCTs."
A superset of all this is daily acute physiology scores analyzed with a longitudinal model. I expect this to have maximum power. It remains to be seen whether it qualifies as a surrogate outcome in the Prentice sense.
For anyone who is unaware of the sister new thread for this discussion it is linked here.
.
The link below is to an approximately 19 min summary talk.
If you are a statistician, buckle up, because you will not believe this 35+ year story could be true, . The story is truly incredible and shows the effect of decades of good statistical math using standardized but invalid (guessed) clinical measurements. .
Please forward. We are trying to garner awareness of the massive waste of resources, opportunity for discovery, and talent… ’
If you read this thread you will not be surprised that the leaders have NOW suggested that the SOFA score needs to be updated (after 25 years). The “need for update” of SOFA is discussed here… .
Changing SOFA would presumably change the criteria for RCT of the “Synthetic Syndrome” of sepsis and which is based on the old 1996 SOFA. A discussion of RCT of synthetic syndromes is provided here.
. …The End of the "Syndrome" in Critical Care
SIRS the old threshold set for sepsis was guessed in 1989 and was the core of the sepsis criteria for RCT (with updates every decade until 2015 when it was abandoned and replaced by SOFA…) As expected given the use of guessed thresholds as measurements to define the synthetic syndrome, no positive sepsis RCT has been reproducible for 30 years.
There is a common theme here discussed in the sister thread.----guessing and updating RCT measurements has replaced discovery. No need for discovery of measurements when you can guess and update them and do RCT.
So, it was predictable that an update of SOFA, like those which which were guessed every decade of the sepsis criterial with SIRS. These updates are widely cited at the apex of another decade of RCT…
This is “science” by administration not discovery… Statistician beware…
Amazing development related to these fake measurements in critical care.
This New study shows the standard summation score of SOFA does not work, which of course all the studies have shown. But the editorial is amazing arguing that the reason SOFA failed is that it is too old and needs to be updated.
Is there any surprise that only 6% (1in16) of single center critical care RCT in top journals with a solid endpoint of mortality are reproducible.
The science is trapped in the oversimplification of simple 1990s idea of summation scores and the 1960s -80s idea of synthetic syndromes and the Petty Bone RCT.
Unless you , the statisticians, help them out of this failed oversimplified dogma, there is no hope.
I worked on the APACHE II and APACHE III projects where thresholds for physiologic measurements were solved for from large datasets, to predict hospital mortality. What’s your feeling about those indexes?
The APACHE scores were widely used and useful for the determination of the general acuity of
the patients admitted, for example, to a specific ICU. It is desirable for resource management to know the general state of severity and chronic disease admitted.
As you know they have a component of adjustment for the pretest state which is pivotal for this purpose. IMHO, ML/AI is well suited to replace such scoring.
The problem with SOFA and SIRS was caused by the use of these for individual patient decision making. Specifically as triage scores for inclusion in an RCT or in the case of SOFA as both for triage (criteria) and as an independent variable or an endpoint of an RCT.
So the move to make threshold sets and specific scores into triage tools for capturing a set of different diseases as a synthetic syndrome and for use as outcome measurements in RCT is what I call the PettyBone paradigm. These are the standard RCT in critical care which studies these captured synthetic syndromes but they are not reproducible. This has led to a discussion of abandonment the RCT in critical care.
To my knowledge APACHE was/is not a part of the PettyBone disease lumping paradigm
So the take-home messages from what you said might be:
APACHE uses data-driven cutpoints that are more valid
What they are valid for is mortality prediction, which is useful for medical planning but the mix of underlying conditions even for a fixed risk of mortality may make APACHE not necessarily of value for patient selection in trials, etc.