I found a couple of papers on machine learning that I thought were interesting enough to mention:
The first is one is a recently published paper by Bradley Efron. It is always educational to read what he writes on statistical methods and philosophy.
The scientific needs and computational limitations of the twentieth century fashioned classical statistical methodology. Both the needs and limitations have changed in the twenty-first, and so has the methodology. Large-scale prediction algorithms—neural nets, deep learning, boosting, support vector machines, random forests—have achieved star status in the popular press. They are recognizable as heirs to the regression tradition, but ones carried out at enormous scale and on titanic datasets. How do these algorithms compare with standard regression techniques such as ordinary least squares or logistic regression? Several key discrepancies will be examined, centering on the differences between prediction and estimation or prediction and attribution (significance testing). Most of the discussion is carried out through small numerical examples.
The following article is critical of “black box” ML models, and the premise that human understandable decisions from the model must be traded off against predictive accuracy. They present a study of an ML competition where they submitted an interpretable model against a field of black box algorithms.