Information criteria vs cross validation

Arman_Eshaghi · September 28, 2019, 5:07pm

Following Gelman’s 2017 publication entitled “Understanding predictive information criteria for Bayesian models” I understand cross-validation and information criteria (Bayesian information criterion and Akaike’s information) can be used separately. Usually with enough sample size one would use cross-validation with some measures of predictive accuracy to select a given model over others. With lower sample sizes, AIC and BIC might be preferred on the training data (without cross-validation). My confusion is that whether AIC and BIC can be used along with cross-volition, for example can AIC and BIC be used on the left-out fold in a 10-fold cross-validation? The idea is to use out-of-sample information criteria to penalise for complexity (AIC) as well as model fit (BIC). I have asked the same question on Cross-validated forum as well.

f2harrell · September 28, 2019, 5:22pm

This is an excellent question. You’ll get more responses on stats.stackexchange.com. I don’t use BIC because it is too conservative. As you stated, if the sample size is not enormous, AIC may be more reliable than cross-validation and is certainly faster. But AIC relies heavily on honesty; you have to be honest in specifying the number of degrees of freedom used in its computation. For example, if any supervised learning-based feature screening is involved, d.f. will be the number of candidate features, especially if not using penalization.

COOLSerdash · September 28, 2019, 6:22pm

This may be tangetial but it can be shown that minimizing the AIC is asymptotically equivalent to leave-one-out cross-validation (LOOCV) (see Stone 1977). Similarly, the BIC is asymptotically equivalent to leave-v-out cross-validation when v=n[1 - 1/(\log(n) - 1)] (see Shao 1997). Rob Hyndman summarized these points in his blog post.

Arman_Eshaghi · September 29, 2019, 10:18am

To summarise excellent answers and respond to myself for posterity:

No, AIC and BIC on the held-out samples do not make sense. One does not need to penalize for complexity in cross-validation and a measure such as a simple log-likelihood might be more suitable, so one should choose AIC or BIC over cross-validation (one or the other). Vehtari and Gelman have a 2016 paper on this: “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC”. See also this forum.