Following Gelman’s 2017 publication entitled “Understanding predictive information criteria for Bayesian models” I understand cross-validation and information criteria (Bayesian information criterion and Akaike’s information) can be used separately. Usually with enough sample size one would use cross-validation with some measures of predictive accuracy to select a given model over others. With lower sample sizes, AIC and BIC might be preferred on the training data (without cross-validation). My confusion is that whether AIC and BIC can be used along with cross-volition, for example can AIC and BIC be used on the left-out fold in a 10-fold cross-validation? The idea is to use out-of-sample information criteria to penalise for complexity (AIC) as well as model fit (BIC). I have asked the same question on Cross-validated forum as well.
This is an excellent question. You’ll get more responses on stats.stackexchange.com. I don’t use BIC because it is too conservative. As you stated, if the sample size is not enormous, AIC may be more reliable than cross-validation and is certainly faster. But AIC relies heavily on honesty; you have to be honest in specifying the number of degrees of freedom used in its computation. For example, if any supervised learning-based feature screening is involved, d.f. will be the number of candidate features, especially if not using penalization.
This may be tangetial but it can be shown that minimizing the AIC is asymptotically equivalent to leave-one-out cross-validation (LOOCV) (see Stone 1977). Similarly, the BIC is asymptotically equivalent to leave-v-out cross-validation when v=n[1 - 1/(\log(n) - 1)] (see Shao 1997). Rob Hyndman summarized these points in his blog post.
To summarise excellent answers and respond to myself for posterity:
No, AIC and BIC on the held-out samples do not make sense. One does not need to penalize for complexity in cross-validation and a measure such as a simple log-likelihood might be more suitable, so one should choose AIC or BIC over cross-validation (one or the other). Vehtari and Gelman have a 2016 paper on this: “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC”. See also this forum.