Sensitivity, specificity, and ROC curves are not needed for good medical decision making

I absolutely agree that diagnostic test research should be focussed on estimating the individual patient prevalence of disease, conditioned on patient characteristics and tests. Where I still see the challenge is adequately translating this information to clinicians.
In my own training, I have noticed myself and many of my colleagues have just been learning what test to order i.e. differentiating a good test from a bad test, or a screening test from a diagnostic test. Although simplified, I suspect that this is one objective we should keep in mind for these studies by validating discrimination and calibration then @f2harrell recommended using a likelihood ratio or AIC to compare the full regression models.
If calibration is presented as I showed above, with probability on the y-axis and the measure/score on x axis, then this also allows clinicians to get an estimate of the the probability for the outcome for different patients. Again I suspect they will (as I do) remember discrete thresholds and how they will manage a patient differently at these thresholds, but at least these decisions are based on the probabilities generated from a full model not a single point like sens/spec, and it is then the clinician taking into account expected utility in their setting.
I recognize that there is a trade-off between entering too many characteristics into the model (i.e. every clinical measure or physical exam finding), therefore making it too specific and dependent on all measures to be accurate, versus entering less measures so it is more generalizable and therefore sacrificing accuracy. My approach to navigating this trade-off so far has been to use the information likely to be available to the clinician at the time of applying the test (which is minimal for me since I focus on EMS/prehospital). Then if we are focussing on the slope of the line (i.e. odds ratio) as discussed above instead of just the absolute estimates, we are left with a good assessment of how much weight the measure should provide to our decision making.

2 Likes