I have a validation data set of 29242 patients, with known labels/health outcomes and predictions that were generated by some model. 28626 patients had a negative outcome and 616 had a positive outcome. The overall AUC is 0.7134.

The reviewers asked to divide the validation data into two subsets/subgroups, defined by a pre-existing medical condition, and then to apply the prediction model to each subset separately.

Out of the of 29242 patients, 4832 had this condition and 24410 did not.

The outcome by subgroup split is

```
0 1
0 24080 4546
1 330 286
```

When I applied the same prediction model to each subset separately, The AUCs for the subgroups were 0.612 and 0.655. That is, the AUCs of each group separately are smaller than the overall AUC. How is that possible?

One explanation I can think of is that the pre-existing medical condition is an important predictor of the original model (the second highest SHAP value). Another explanation may relate to the balanced outcome withing the subgroup with the medical condition.

What do you think?