Clustering & Synthetic data generating processes (SDPDs)

Dr.Harrell presents an excellent discussion of the “fad” of mathematically generated phenotypes below and this is an important review.

Mathematical embellishment of data without the sufficient domain knowledge has always been a pitfall. Of course phenotypes driven by domain knowledge are often quite important clinically. Many of these are separations of lumped pathological pathways which were lumped by similarity heuristics in the past. The eosinophilic phenotype of “COPD” for example.

Yet the relentless creation and study of “synthetic data generating processes” (SDGPs) which emerged around the turn of the century was one of the greatest errors of medical science. It’s a disposable synthetic research generating enterprise which scientists think is real science.

A data generating process (DGP) in clinical medicine is best considered as a causal process of biological origin. Then it is a biologic entity and worthy of RCTs. In contrast an SDGP is an unlimited source of study. They can be created by any mathematical maneuver, by any consensus, and then studied and compared only to be tweaked and studied again and again. It’s an endless cycle of disposable research. Gates, scores, there is no limit. All that’s required is the pedigree to create them as you see with the new SOFA 2.

Of course the study of SDGPs is a major industry with high volumes of discardable publication monthly. Yet when studied by RCT these sit on the third estimand so it’s important to determine if you are studying an SDPG even if provided by consensus committee if there is any consideration of transport to treatment of real DGP of patients. The blind generalization of SDGP treatment effects to real DPG treatment effects is a biologically naive action.

The most durable SPDG generating gate was the SIRS criteria. These lasted as a standard for 24+ years producing substantially nothing but discardable research until it was finally abandoned without introspection and replaced by the second most durable SDGP generating gate, the SOFA score which has likewise generated substantially nothing but discardable research.

Not surprisingly, SOFA 2 is now available for all your SDGP generation needs. SOFA 1 was guessed in 1996. Who knows how long this new SOFA 2 will last but the SPDGs it can generate for RCT, OS and statistical embellishment are infinite.

https://www.pinsonandtang.com/resources/sepsis-3-update-the-new-sofa-score-sofa-2/

Mathematics is man made. Biology is not. A phenotype that is real is really a biological entity we just think it’s a phenotype. A phenotype that is the product of mathematics might be useful to find a biological causal process but they must be recognized as SDPGs until proven otherwise.

1 Like