Strategy for developing a neural network using a small dataset

I have at hand a dataset of n = 500 patients that participated in a clinical trial (p = 7 predictors) and I would like to develop a survival neural network for competing risks and compare it with the cause-specific Cox model and the Fine-Gray model for this data. Since the sample size is small; I’m wondering which of the following approaches would be more appropriate:

a) Develop the neural network (hyperparameter tuning) with 5-fold cross validation on the entire dataset + validate the 3 models with bootstrapping on the original dataset, or
b) Develop the neural network with bootstrapping on the entire dataset + validate the 3 models with another seed of bootstrapping on the original dataset?

I like to avoid multi-step procedures when possible. In particular I’m not a fan of using both cross-validation and bootstrapping for the same problem. Choose one or the other. If using cross-validation consider 100 repeats of 10-fold cross-validation.

The trick with resampling methods if that you have to embed all supervised learning steps inside the resampling loop. This would include feature selection and search for tuning parameter values. These processes need to be repeated afresh for each loop.


Thanks a lot for your reply Prof Harrell. Say that I decide to bootstrap the original data B = 100 times, would I then use the bootstrapped samples (N = 500 each time) to develop the neural network over a grid of parameters and then validate its performance in the out-of-bag patients? Wouldn’t it be an issue that the final tuned model is different for each repeat of the procedure?

Regarding the 100 repeats of the 10-fold cross-validation, do you need to use a split sample approach (to create train and validation data 100 times) and then apply the 10-fold cross-validation on the train data and validate the final tuned model on the validation data? And again, wouldn’t it be an issue that the tuned model is different for each repeat of the procedure?

I’m not a fan of the way many machine learning experts think of 3 different samples. I want the empirical tuning parameter selection to be part of the outer look such that it is full replicated each time. And with 100 repeats of 10-fold CV you in effect fit models on 1000 different, overlapping, training samples.