I was thinking about this post on statistical vs machine learning.
I became curious if anyone had worked out the formal relationships among conventional statistical models (ie. hierarchical and/or generalized linear models) and machine learning models: support vector machines, neural nets, etc.
I am aware of 2 papers that explore the relationship:
Neural Nets are essentially polynomial regression by Xi Cheng and Norman Matloff
These are good resources. I’d like to find a paper that addresses this through effective degrees of freedom. For example a regression model with p linear main effects has effective d.f. of p. A random forest allows for all possible 2-way 3-way … interactions so will have d.f. that is probably at least p + p(p-1)/2. Effective d.f. relates directly to overfitting and the sample size needed for a method to work.
I found a Master’s dissertation from 2022 that directly studied this issue in terms of sample size estimation. It seems limited due to the fact it emphasized classification over estimation. I did not even see a mention of the Brier score, and the emphasis appears to be on inadmissible scoring rules. Despite that, the references listed look like a good start to exploring what has been discussed among ML practitioners.
Prol Castelo, G. (2022). Minimum sample size estimation in Machine Learning.
Worth reading in the context of the above reference is this post: