Model selection and assessment of model fit

cdj · August 6, 2018, 12:45am

From what I understand, adding more explanatory variables will always improve the fit of the model, but there is a trade-off between goodness of fit and complexity of the model. And I believe the fact that AIC is a criterion for balancing fit with complexity makes it appropriate for model selection?
Finding the right balance is indeed challenging, but I would like to build a more parsimonious model to prevent overfitting. Although I could include all six predictors, I am not convinced that it would be advantageous. How do you know that the full model is a valid and robust model?

Some people claim that modeling is an art, not a science – but could you elaborate on why you would suggest to include all of them, even if the AIC increases and the p-value from the le Cessie–van Houwelingen–Copas–Hosmer unweighted sum of squares test just barely exceeds 0.05, while the new variable turns out not to be significant in the model? I do realize that the variable is explaining something in the response variable, since the coefficient is not literally zero, but couldn’t you keep on adding variables ad infinitum?
In fact, even more predictor variables were measured, but for this specific analysis, I decided to consider six of them as candidate predictor variables, since only these can be used in the production in a sensible matter.
Besides making it easier to interpret and communicate the results, wouldn’t using a subset also prevent overfitting?