I have a binary response, which I would like to model in terms of (some subset of) six candidate predictor variables, so the natural choice was to fit a binary logistic regression model. I have two related questions.

Question 1 (model selection):

In order to identify the appropriate predictors to include in the regression model, I decided to perform an exhaustive search for the best subsets of the candidate predictor variables.

There is a total of 2^6 = 64 potential regression models, so for each of these models, I tested the global goodness-of-fit using the le Cessieâ€“van Houwelingenâ€“Copasâ€“Hosmer unweighted sum of squares test.

Literature (including Harrellâ€™s â€śRegression Modeling Strategiesâ€ť) generally suggests that the le Cessieâ€“van Houwelingenâ€“Copasâ€“Hosmer unweighted sum of squares test should be preferred to the orthodox Hosmerâ€“Lemeshow test, so I decided to use that. Five of these potential models had a p-value > 0.95.

I decided to present these five best-fitted models according to this criterion in a table (sorted by the p-value in descending order).

The first one (with three explanatory variables) has a p-value of 0.993. The second one (with two explanatory variables) has a p-value of 0.989. However, the Akaike information criterion (AIC) suggests that the second model should be preferred to the first one, since the AIC is 811.31 for the first and 749.17 for the second.

As far as I know, the standard approach would be simply to use AIC for model selection. Does the le Cessieâ€“van Houwelingenâ€“Copasâ€“Hosmer unweighted sum of squares test have any particular advantages over AIC, and does my approach look reasonable?

Question 2 (assessment of model fit):

For binary logistic regression models, would it generally be sufficient to report p-values of the le Cessieâ€“van Houwelingenâ€“Copasâ€“Hosmer unweighted sum of squares test for assessment of model fit? As mentioned in Harrellâ€™s book, there are no distributional assumptions whatsoever for binary logistic regression models. I could possibly include bootstrap overfitting-corrected lowess nonparametric calibration curves as well?