Model B seems better across the board, less under-prediction, better AUC etc. It’s however not directly clear if the better performance of B results in a worthwhile improvement in individuals with the outcome undergoing the biopsy and whether this outweights any discomfort/risk that is caused by the additional test that is required for model B.
In a bit more detail. In the first topic about these models we considered the 5-20% range as the likely area for the cut-off for determining whether to proceed to a biopsy or not. In this range, there is some difference with again a better performance for model B around these cut-offs, but the differences are not that big as those at higher predicted risks.
I think these last ones are relatively unimportant for choosing between model A and B, as on average the high risk individuals will receive a high predicted risk under both models. These individuals gain only the potential harm from undergoing the additional test with no additional benefit from model B (as long as we think of this as a theoretical exercise of course, in reality there might be other benefits such as @MSchwartz highlighted).
In the area around the cut-offs, B performs slightly better and should on average more accurately predict whether individuals are above/below any potential threshold. But from the data alone above, I don’t think we can say how big this improvement is and if it outweighs exposing the entire population to an additional test.