Rebranding of "Statistics" as "Machine Learning" to Sound More Impactful & Negative Fallout



Fair enough, and I’ll take your word for it about PH models as my understanding is that only parametric survival models allow one to make predictions about expected time-to-event for an individual.

I guess that one reason that as a clinician-investigator I’ve been drawn to ML methods is that, whether it can be done easily or not, the vast majority of observational studies using conventional statistics neglect to perform external validation. As a result, the literature is overrun with studies showing ‘significant’ associations for risk factors with disease, most of which are only later invalidated via RCTs (nutrition-science and vitamin supplement fields are among the worst offenders). My understanding of the math of many ML methods, including deep learning, is that these are generally just regression models with nonlinear basis expansion, and so it really isn’t the technique that is different from statistics, but rather the absolute devotion to external validation (i.e., prediction) over all other factors. I find this a refreshing change from the weekly research article cited by Cardiosource claiming that mixed nuts, but not almonds, are associated with incident coronary artery disease in Swedish men (’…future studies are needed to examine this risk factor prospectively…’). That some deep learning methods provide seemingly ‘magical’ capabilities, for example in voice or image recognition, is all the more exciting, although I think skepticism is appropriate for whether this ‘magic’ can be applied to clinical data.

In fairness, there are approaches to exploring the individual effect of a single risk factor in an ML model using feature importance functions (I believe that there are several methods for this depending on the ML model; one approach examines change in accuracy with that feature left out), and even for neural networks investigators are working on ways to learn more about what a model is ‘learning’ (Local Interpretable Model-Agnostic Explanations (LIME) is one application). I’d be interested to hear what statisticians think about the validity of these methods; no doubt this knowledge would be necessary before most clinicians would be comfortable with application.

Finally, I wouldn’t overestimate the need to fully understand the mechanism of a drug, procedure, or predictive model, in order for clinicians to apply it in practice. At least within my field of cardiology, the history is chalked full of medications that we only learned were effective from the impact on outcomes, and then later went back to the lab to try to understand why it works on a mechanistic level. The most effective treatment for the heart rhythm disorder atrial fibrillation is a procedure called a pulmonary vein isolation, which frankly, we only partially understand why it is so effective. If an ML model was highly accurate in prediction (and had been validated prospectively), I have no doubt that clinicians would be more than comfortable adopting it before fully understanding the relative contribution of the individual predictors.


If censoring isn’t too high, Cox can predict median and mean life length. But it can always predict incidence/survival probabilities up until follow-up runs out.

I don’t know about publication statistics but ML and stat models are not really different with respect to the need for nor the methods used for validation.

I think you’d be surprised how wide the confidence intervals are for these importance metrics. The data are often not only not rich enough to derive reliable predictions, but are not rich enough to tell you how predictions were derived.

I have doubts that ML would do better than a statistical model in that setting, unless you are doing image analysis.


Would you be able to elaborate on this?


The sample size needed for these is roughly the same. Poor predictive accuracy is related to inability to estimate the model’s parameters (including high standard errors for them). Overfitting is related to model parameter overinterpretation.