Can traditional statistical metrics be used to evaluate whether a prediction model should be used in practice?

Going back to the example, I think just like with the first one, we can’t make a definite conclusion about model usefulness based on just these metrics.

If we consider a risk threshold between 5-10% such as @S_Chakraborty suggests, then the model is miscalibrated precisely in this range and risks are overestimated for individuals with an average actual risk around the potential thresholds. I’d say this might make the model less useful for decision support as it is this area that you are uncertain of the balance between missing a node and treatment complications.