Some articles on "machine learning" vs. statistical models

I found a couple of papers on machine learning that I thought were interesting enough to mention:

The first is one is a recently published paper by Bradley Efron. It is always educational to read what he writes on statistical methods and philosophy.

The scientific needs and computational limitations of the twentieth century fashioned classical statistical methodology. Both the needs and limitations have changed in the twenty-first, and so has the methodology. Large-scale prediction algorithms—neural nets, deep learning, boosting, support vector machines, random forests—have achieved star status in the popular press. They are recognizable as heirs to the regression tradition, but ones carried out at enormous scale and on titanic datasets. How do these algorithms compare with standard regression techniques such as ordinary least squares or logistic regression? Several key discrepancies will be examined, centering on the differences between prediction and estimation or prediction and attribution (significance testing). Most of the discussion is carried out through small numerical examples.

The following article is critical of “black box” ML models, and the premise that human understandable decisions from the model must be traded off against predictive accuracy. They present a study of an ML competition where they submitted an interpretable model against a field of black box algorithms.


Very interesting. Thank you.

“Interpretable” portions of the outputs have to be clinically useful. Many features identified in the interpretive output may be common death signals, others may represent recovery signals responsive to treatment (eg a rise in SPO2). These are not very helpful.

This linked paper discusses challenges unique to acute care AI which are not present in many other fields.

In that article I discuss the issue of clinician oversight of ML/AI as well as the need for detection of protocol failure, ie the original decision made by the AI was wrong.

Regretably present education emphasises threshold decisions making (if X value, which may be a composite score like SOFA, then Y action) . This oversimplification will be transcended by pattern recognition and quantification of relational time patterns. We argue that the analysis of time series matrices will be more like interpretation of radiographs, a place where AUC was applied in the 70s.
Now, of course, we generate AUC/time.

In our hands regression performs very closely to RF and GBM but GBM seems to squeeze out a little more area.