Optimal decision theory for diagnostic testing: Minimizing indeterminate classes with applications to saliva-based SARS-CoV-2 antibody assays](Redirecting)

The paper looks useful but seems to be written for someone who is allergic to continuous probabilities. Sensitivity and specificity need play no role in computing P(disease | test results and pre-test patient characteristics).

@f2harrell: I think I understand (and agree) with your criticisms of sensitivity and specificity in the diagnostic context by doctors “on the front lines.”

Briefly summarizing: doctors cannot use sens/spec as it conditions on the unknown disease status.

Blockquote

While sensitivity and specificity sound like the right thing and a good thing, there is an essential misdirection for prediction. The problem for prediction with focusing on sensitivity and specificity is that you are conditioning on the wrong thing: the true underlying condition; you are conditioning on the thing you actually want information about.

**My question:** Developers diagnostic tests continue to use sens/spec, since they can do pure experiments where the target condition is known (ie. they have total control over false positives and negatives) .

My intuition is that the use of sensitivity and specificity remains valid for clinical laboratory quality control and diagnostic test development, but I’m not 100% positive this is right.

I’m thinking of PCR as an example. How could the information be integrated into a diagnostic procedure that takes into account the important variable of cycle threshold without resorting to sens/spec at the clinician level?

It seems we would need:

- P(symptom severity | PCR result),
- CT = informative cycle threshold range (max - min)
- Other prognostic covariates.

The observed PCR result needs to be discounted by \frac{1}{2^n}, where n = [0, (CT - 1)] . Since PCR doubles the quantity every cycle, the nominal value for a result decreases as cycle count increases.

AFAIK, typical CT are in the 20 to 40 range. Mapping that range into [0, … , 19 ], I derive possible discount/shrinkage coefficients into the range [\frac{1}{2^0}, ..., \frac{1}{2^{19}}].

Sensitivity and specificity, because they make the error of the transposed conditional, should never have been used in medical diagnosis.

It is interesting to read Newman’s editorial on Yerushalmy’s study of diagnostic testing.

https://doi.org/10.2307/4586295

I particularly like his final paragraph

There seems to still be a gap in communication between test users (ie. doctors) and test developers (ie. biologists, chemists, etc.).

From Drew Levy’s article above:

Blockquote

Sensitivity and specificity are properties of the measurement process.Sensitivity- and specificity-based measures are not meaningful probabilities for prediction per se, unless we are specifically interested in our informed guesses when we already know the outcomes—the retrospective view; for example, the probability of the antecedent test result given present knowledge of disease or outcome status.

I can see why lab scientists and techs find sens/spec useful. But I could see how disease status and patient characteristics might lead to a wide range of +LR or -LR values that don’t get taken into consideration (AFAIK) in the diagnostic process, that users need to consider.

Sens and spec were appropriated from radio/radar research to be used in virology. It was a proof-of-concept application where you want to show that a culture can detect the presence of a virus. Medical diagnosis, forward in time, differs greatly from that setting.

Are lab scientists and test developers making an error by relying on sens/spec to evaluate their procedures? That is where I’m confused. I don’t think so, but then the published testing evaluation statistics are very limited for a clinical context.

**Added after Frank’s reply just below**

FWIW, I can understand the sub-optimal use of sens/spec in the development of diagnostic tests, considering this FDA guidance document from 2007. I haven’t found anything more recent after a brief search this morning.

**Related to diagnostic testing broadly**

**Related to PCR tests specifically**

While I’m pretty sure the diagnostic info could be incorporated in a more informative way, I am not yet convinced standard practice conveys 0 information to physicians. If anyone has an example, I’d appreciate it.

**Related**

Yes they are making an error, for many reasons. Here are some:

- sens and spec are patient-specific but they use a meaningless average sens and spec
- they discourage the use of pre-test information
- having good sens and spec in no way implies the provision of useful clinical information for forward decision making

Lets assume that low MCV and high RDW (together) in anemia has 100% sensitivity for Fe deficiency and a patient returns a test result with normal MCV. The doctor concludes that Fe deficiency is ruled out and looks for other causes of anaemia. Is this a valid conclusion in practice? The reason I ask is that this sort of decision making occurs all the time in clinical practice where tests are looked at in terms of high sensitivity (to exclude disease) or high specificity (to confirm disease).

I would compute the probability of disease given every possible test result and look at the spectrum of post-test probabilities. That’s what matters. You may or may not want to use sens and spec to get those post-test probs.

This editorial discusses some of the problems with sensitivity and specificity. (It’s a commentary on another article proposing that *interval likelihood ratios* are more clinically useful). The author writes:

Their fundamental flaw is their failure to tell us what we wish to know. In fact, they seem configured to tell us very much the opposite by answering the wrong question: Given that a patient does or does not have a particular disease, respectively, what is the probability that a test result is positive (sensitivity) or negative (specificity)? This constitutes a confusing reversal of customary clinical logic, because knowledge of the patient’s disease status would presumably preclude a diagnostic test aimed at detection of an illness the patient is already known to have.

This is well stated and gives us great cause for worry in how the subject is taught in medical school. See also Properties of Diagnostic Data Distributions on JSTOR. Dawid rightly calls the sens-spec approach as a “sampling” approach and the use of probabilities of disease as a “diagnostic” approach. Spot on.

Gallagher gives a glowing recommendation for the likelihood ratio (and I agree with him on all the points raised in favor of the LR) but he may not have realized that:

LR+ = Sen/(1-Spe) and

LR- = (1-Sen)/Spe

I am beginning to agree with Frank on this after much thought and looking back on 30 years of their use in practice. They are probably not useful on their own unless we combine them into more composite measures

Dawid says “*It is argued that the prevailing paradigm of diagnostic statistics, which concentrates on incidence of symptoms for given disease, is largely inappropriate and should be replaced by an emphasis on diagnostic distributions. The generalized logistic model is seen to fit naturally into the new framework*.”

I assume he means updating the probability under test negative to that under test positive using the DOR that is derived from the composite of both Sen and Spe given that:

LnDOR = logit(Sen) + logit(Spe)

That is correct in the narrow infrequent case of all-or-nothing test outputs.

If we take a continuous test output example (say serum ferritin to diagnose Fe deficiency anaemia) we could get a DOR for say a 5ug/L increment from logistic regression and then apply this but we still need a threshold to center ferritin so that the baseline odds is meaningful - is that correct?

Just compute P(disease | exact value of all variables). No ratios needed.