I believe that sensitivity, specificity, and ROC curves are not useful for medical decision making, e.g., which action to take after getting the results from a diagnostic test. My arguments rest primarily on problems with backwards-time backwards-information-flow probabilities and problems caused by dichotomizing tests and diagnoses. Drew Levy has written eloquently about problems with transposed conditionals. It is important to respect the flow of information by conditioning on what you already know in order to predict what you don’t. Predictions/forecasts of outcomes as well as knowing the consequences of decisions are key to making good decisions.
Assume that we have a reliable (well calibrated) estimate of disease risk, and assume that the precision (e.g., width of a compatibility (aka confidence) interval, or a Bayesian credible interval) is so good that uncertainty in the point estimate may be safely ignored. [In most situations, the training sample size is small enough that precision can’t be ignored, and a full Bayesian calculation of expected utility that uses the entire posterior distribution of risk for an individual patient should be used.]
The decision that optimizes the expected utility (and given the utility for the patient at hand) can be derived as a risk cutoff in this infinite training sample size situation. Nowhere in the risk estimate is the sensitivity (sens), specificity (spec) or an ROC curve necessary. And sens and spec do not figure into formulation of utilities. It follows that the optimum decision does not need sens, spec, or ROC curves in any way. Another way to say this is that incorporating sens and spec into the decision rule is like making 3 left turns instead of a right turn. But once you know the risk of disease and the patient-specific utilities you’re really done.
Here is the derivation. Suppose that there are two possible actions: A and B, where B means “not A”. For example an action may be to get a prostate biopsy after a PSA test in a man. Let Y=0 denote “not diseased” and Y=1 denote diseased. Each action and each true diagnostic status will have associated with it a cost or loss. Define these losses by the following table.
Action Y Loss A 1 a A 0 b B 1 c B 0 d
For example, the loss from taking action A if disease is present is a. What is the expected loss since we don’t know the true disease status? If the patient’s risk of disease is r, the expected loss is ra + (1-r)b from taking action A. Likewise the expected loss from taking action B is rc + (1-r)d. One way to optimize the decision is to choose the action from (A,B) that gives the lower expected loss. Thus we take action A if ra +(1-r)b < rc + (1-r)d and otherwise take action B. Solving for r, we take action A if r < (d-b)/(a-c+d-b). If action B is the more aggressive one such that the loss is zero if the patient is ultimately diagnosed with the disease, one might take c and b to be zero. In that special case the risk threshold is d/(a+d).
Note: This formulation is fairly general. The risk estimate r doesn’t care whether it is dominated by risk factors or by the results of the current medical test; nor does it care whether the test has a binary vs. a continuous output, multiple outputs, or whether the test results interact with age or sex. Contrast that with sens/spec which assume a binary diagnosis, binary test output, no interaction between test results and other patient variables, and constancy of sens and spec over patient types (the latter is provably untrue in general when the diagnosis was created by dichotomania).
Here’s an analogy concerning the value of forward-looking decisions and probabilities: an optimum decision in a poker game is based on the possible winnings/losings and on the probability of winning the hand. It doesn’t help the player to envision the probability of getting a hand this good were she to go on and win or lose (spec/sens). Decision making is forward in time and information flow, and needs to use forward probabilities (unless you are an achaeologist or a medical examiner).
This discussion is related to the way that medical students are wrongly taught probabilistic diagnosis. I’ve seen MDs and statisticians alike, when given all the numbers needed to compute P(disease | X) still compute sens and spec from a cohort study and then use Bayes’ rule to get P(disease | X). They don’t realize that everything cancels out (3 left turns) leaving the originally-trivially-derivable direct estimate of disease risk (right turn). I think that another point confused in the literature is the difference between group decision making and individual patient decision making (I’m only interested in the latter).
Now for a real kicker: Anyone using sens and spec in formulating decision rules is simply wrong if they consider sens and spec to be constant over patient types. Using sens and spec, when doing so accurately, only adds complexity because we know that sens and spec vary over patients so to be accurate you have to incorporate probability models for sens and spec that are functions of patient characteristics. Details about this are in the BBR diagnosis chapter. Simple explanation: any disease that is not binary will be easier to detect when it is more severe. Any patient characteristic associated with severity of disease will be associated with test sensitivity.
I’d be interested in any demonstration that a backward-information-flow probability (including an ROC curve whose points are constructed from backward-time probabilities) is necessary for making optimum medical decisions for individual patients.
Some useful papers on medical decision making are here.