I was thinking about this post on statistical vs machine learning.
I became curious if anyone had worked out the formal relationships among conventional statistical models (ie. hierarchical and/or generalized linear models) and machine learning models: support vector machines, neural nets, etc.
I am aware of 2 papers that explore the relationship:
Neural Nets are essentially polynomial regression by Xi Cheng and Norman Matloff
These are good resources. I’d like to find a paper that addresses this through effective degrees of freedom. For example a regression model with p linear main effects has effective d.f. of p. A random forest allows for all possible 2-way 3-way … interactions so will have d.f. that is probably at least p + p(p-1)/2. Effective d.f. relates directly to overfitting and the sample size needed for a method to work.
I found a Master’s dissertation from 2022 that directly studied this issue in terms of sample size estimation. It seems limited due to the fact it emphasized classification over estimation. I did not even see a mention of the Brier score, and the emphasis appears to be on inadmissible scoring rules. Despite that, the references listed look like a good start to exploring what has been discussed among ML practitioners.
Prol Castelo, G. (2022). Minimum sample size estimation in Machine Learning.
Worth reading in the context of the above reference is this post:
This is fantastic. I’m add it to my course notes. It is a bit telling that they avoid the kind of problems we see in biomedical research where the signal:noise ratio is much higher than in their examples, and for which ML was not designed (at least not with typical sample sizes). Also see Norm Matloff’s preprint on the equivalence of neural networks and polynomial regression.