I have about 30 variables that I think might be useful for modelling my outcome which is a binary variable.
Before I select my model I want to preregister. What is the standard way to do this?
For example, when selecting between model A and model B should the CV etc be done on a limited subset of the 30 variables?
My feeling is that if I do model A and then variable selection and compare it to model B after variable selection that isn’t a good comparison, nor is it if I select model B and do variable selection and then use those variables for model A.
Am I overthinking this? I see lots of once the model has been chosen how to choose variables, however less seen (or can someone reference me) is if the choice of variables is known, how to select the best model.