Relationship between statistical models and machine learning algorithms

I was thinking about this post on statistical vs machine learning.

I became curious if anyone had worked out the formal relationships among conventional statistical models (ie. hierarchical and/or generalized linear models) and machine learning models: support vector machines, neural nets, etc.

I am aware of 2 papers that explore the relationship:

  1. Neural Nets are essentially polynomial regression by Xi Cheng and Norman Matloff
  1. The following slides discuss the close relationship between logistic regression and
    support vector machines (SVM). Support Vector Machines vs Logistic Regression, University of Toronto CSC2515

Does anyone have other papers exploring this?

2 Likes

A classic framing of boosting (eg, AdaBoost) as a statistical model is Friedman, Hastie, and Tibshirani (2000)

A more recent framing of deep learning as a “regression model with transformed predictors” is given by Hoeting (2021) in the following:

2 Likes

I’ve found Kevin Murphy’s book to be a great resource for this: “Probabilistic machine learning”: a book series by Kevin Murphy | pml-book

Some other topics:

LLM’s are Markov models https://statmodeling.stat.columbia.edu/wp-content/uploads/2023/07/carpenter-llm-2023.pdf

Diffusion models (used in image generation etc)

2 Likes

These are good resources. I’d like to find a paper that addresses this through effective degrees of freedom. For example a regression model with p linear main effects has effective d.f. of p. A random forest allows for all possible 2-way 3-way … interactions so will have d.f. that is probably at least p + p(p-1)/2. Effective d.f. relates directly to overfitting and the sample size needed for a method to work.

2 Likes

I found a Master’s dissertation from 2022 that directly studied this issue in terms of sample size estimation. It seems limited due to the fact it emphasized classification over estimation. I did not even see a mention of the Brier score, and the emphasis appears to be on inadmissible scoring rules. Despite that, the references listed look like a good start to exploring what has been discussed among ML practitioners.

Prol Castelo, G. (2022). Minimum sample size estimation in Machine Learning.

Worth reading in the context of the above reference is this post:

1 Like