Relationship between statistical models and machine learning algorithms

R_cubed · September 12, 2023, 11:29pm

I was thinking about this post on statistical vs machine learning.

I became curious if anyone had worked out the formal relationships among conventional statistical models (ie. hierarchical and/or generalized linear models) and machine learning models: support vector machines, neural nets, etc.

I am aware of 2 papers that explore the relationship:

Neural Nets are essentially polynomial regression by Xi Cheng and Norman Matloff

The following slides discuss the close relationship between logistic regression and
support vector machines (SVM). Support Vector Machines vs Logistic Regression, University of Toronto CSC2515

Does anyone have other papers exploring this?

ChristopherTong · September 13, 2023, 1:25am

A classic framing of boosting (eg, AdaBoost) as a statistical model is Friedman, Hastie, and Tibshirani (2000)

A more recent framing of deep learning as a “regression model with transformed predictors” is given by Hoeting (2021) in the following:

js592 · September 13, 2023, 3:24pm

I’ve found Kevin Murphy’s book to be a great resource for this: “Probabilistic machine learning”: a book series by Kevin Murphy | pml-book

Some other topics:

LLM’s are Markov models https://statmodeling.stat.columbia.edu/wp-content/uploads/2023/07/carpenter-llm-2023.pdf

Diffusion models (used in image generation etc)

f2harrell · September 13, 2023, 7:25pm

These are good resources. I’d like to find a paper that addresses this through effective degrees of freedom. For example a regression model with p linear main effects has effective d.f. of p. A random forest allows for all possible 2-way 3-way … interactions so will have d.f. that is probably at least p + p(p-1)/2. Effective d.f. relates directly to overfitting and the sample size needed for a method to work.

R_cubed · September 17, 2023, 1:49pm

I found a Master’s dissertation from 2022 that directly studied this issue in terms of sample size estimation. It seems limited due to the fact it emphasized classification over estimation. I did not even see a mention of the Brier score, and the emphasis appears to be on inadmissible scoring rules. Despite that, the references listed look like a good start to exploring what has been discussed among ML practitioners.

Prol Castelo, G. (2022). Minimum sample size estimation in Machine Learning.

Worth reading in the context of the above reference is this post:

ChristopherTong · March 24, 2025, 4:39am

I recently stumbled across this 2023 paper, “A brief tour of deep learning from a statistical perspective” relevant to this thread.

https://www.annualreviews.org/content/journals/10.1146/annurev-statistics-032921-013738

f2harrell · March 24, 2025, 11:38am

This is fantastic. I’m add it to my course notes. It is a bit telling that they avoid the kind of problems we see in biomedical research where the signal:noise ratio is much higher than in their examples, and for which ML was not designed (at least not with typical sample sizes). Also see Norm Matloff’s preprint on the equivalence of neural networks and polynomial regression.