Paper: Machine learning for the prediction of sepsis

just read this paper “Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy

some key bits:

-ML understood to mean: “any ML classifying technique to predict the onset of the target condition, through some type of learning from presented data in a training dataset. Scikit Learn is one of the most used packages to code machine learning models in the popular programming language Python. Pragmatically, all supervised learning models found in this package were considered machine learning models …”

-“we focus on right-aligned models in this paper” [=continuous prediction, considered more clinically useful than left-aligned=predict onset of sepsis]

-arguably not “prediction”: “model … trained to predict whether sepsis is present in a new patient based on all other variables”, although “prior to sepsis onset … clinically overt signs of sepsis may be subtle or absent and false positive alerts in these studies may create alarm fatigue.”

-“we set out to perform a systematic review of published … machine learning models that predict sepsis including aggravate forms such as septic shock in any hospital setting”

-“these models show excellent performance retrospectively, but … few prospective studies have been carried out”

-“Substantial heterogeneity was observed between studies regarding the setting, index test, and outcome.”

-“The AUROC, a summary measure of sensitivity and specificity, has been customary to the field of diagnostic test accuracy. Since 24 out of 28 papers (86%) reported the AUROC, this was pragmatically selected as the main performance metric … AUROCs … were transformed and linearized to a continuous scale by taking the logit transformation”

-“To account for the high ratio of covariates to number of models, some of the features identified in the models were grouped … only covariates with 10% variance in their values were included”

-“All covariates were first tested in a univariate model for a significant contribution to the transformed AUROC using a likelihood ratio test against an empty model containing only the intercept and the variance components. All significant covariates (p < 0.05) were then considered for a multivariate model. Through backward Akaike information criterion (AIC) selection, a parsimonious model was selected.”

-“A total of 111 models were included in the meta-analysis after removal of an outlier … 39 covariates in the meta-analysis random effect model”

-"Univariate analysis of the 39 covariates shows heart rate, respiratory rate, temperature, lab and arterial blood gas values, and neural networks (relative to ensemble methods) positively contributed to the AUROC (range 0.344–0.835). Only temperature, lab values, and model type remained in the multivariate model. "

-“Many studies use ICD coding, which may be an unreliable instrument to identify septic patients”

-"Only one study clinically validated their model and showed that these models outperformed nurse triaging and SIRS criteria in the emergency room "

-“there is no compelling evidence that machine learning predictions lead to better patient outcomes in sepsis”, although they conclude “This systematic review and meta-analysis show that machine learning models can accurately predict sepsis onset with good discrimination in retrospective cohorts”

There seems to be a lot of problems there including the use of AUROC and stepwise variable selection. The word classification is being misused here. Classification refers to a forced-choice dichotomization of responses. Risk estimation = prediction on the other hand.

1 Like

im not sure how these habits persist. Eg re stepwise, they used “backward Akaike information criterion (AIC) selection.” And this from a paper in stat med you coauthored in 2000:
“Selection methods included backward stepwise … the AIC criterion … We found that stepwise selection with a low α (for example, 0.05) led to a relatively poor model performance, when evaluated on independent data. Substantially better performance was obtained with full models with a limited number of important predictors, where regression coefficients were reduced with any of the shrinkage methods.”

i can only guess that it’s a manifestation of an increasingly fractured community of experts. Maybe the “data scientists” are not reading our literature and we are not reading theirs