Research into the performance of tests for diagnosis and treatment selection

This is my proposed introduction to the final chapter of the forthcoming 4th edition of the Oxford Handbook of Clinical Diagnosis:

Very little of the transparent diagnostic reasoning process described in the Oxford Handbook of Clinical Diagnosis is currently supported by research evidence. In order to address this problem, the final chapter proposes new concepts. It is directed at medical professionals, their students and medical scientists especially those who still recall what they have been taught about mathematics and statistics and also at professional statisticians whose training ensures that they have this knowledge.

The chapter begins by addressing the connection between observed proportions and probabilities, how a probability based on a proportion of 1 out of 2 (i.e. 0.5) is different to 1000 out of 2000 (also 0.5) and how such probabilities might be expressed differently. There is then a discussion about the possible proportions that might be discovered eventually if an infinite number of observations are made in an identical way and also if the study were to be repeated only with the same number of observations.

Bayes rule is shown to be a special case of an ‘inverse P circuit rule’ and closely related to the syllogism. This is followed by various representations and interpretations of Bayes rule that rely heavily on conditional independence (e.g. by Bayesians, AI and machine learning advocates) and those who do not. Differential diagnostic reasoning by probabilistic elimination assumes conditional dependence (not independence) and applies a theorem based on the extended version of Bayes rule. Instead of overall likelihood ratios based on false positive rates divided by sensitivities, reasoning by probabilistic elimination uses ratios of sensitivities and false negative rates. Specificity and false positive rates are useful in population studies but not in clinical reasoning at the bedside or clinic. Tests which are very useful for reasoning by probabilistic elimination may appear useless if assessed using overall likelihood ratios based on specificities and false positive rates.

Instead of dichotomising result into high normal or low, numerical results can be interpreted directly (e.g. the differential diagnosis of abdominal pain in a person aged 15 years instead of in ‘children under 15 years’ and ‘adults of 15 years and over’). Instead of assessing the average efficacy of treatment in subjects with a test result within some fixed range (e.g. an albumin excretion rate (AER) of 20mcg/min or above), the efficacy can be estimated at each possible value of the test result (that reflects the way that experienced clinicians would apply RCT results by assessing the degree of severity of an illness in an individual patient). This can be modelled by fitting calibrated logistic regression functions or some other modelling function separately to the placebo and treatment data. This shows that it is practically as well as mathematically inappropriate to assume that risk ratios or odds ratios are constant for all baseline risks especially those estimated on populations not assessed during a RCT. Applying logistic regression also allows 95% confidence intervals to be placed on the estimated probabilities. These can be interpreted as a probability (or confidence) of 0.95 of replicating an estimated probability within the 95% confidence limits after an infinite number of observations are made (by analogy with 95% predictive intervals).

Interpreting individual test results allows thresholds to be established for diagnoses and for offering treatments that avoid over-diagnosis and over-treatment. The latter are often due to basing thresholds arbitrarily on two upper or lower standard deviations of a test results in some population. On the other hand, interpreting individual test results (e.g. an AER of 67mcg/min as opposed to an AER of 20mcg/min or over) allows more accurate probabilities to be used during shared decision making with a patient perhaps by applying Decision Analysis. When doing this, it is also important to take into account the variation in the results of tests performed on individual patients by estimating the standard deviation of those test results.

My aim is to try to improve understanding and collaboration on research into diagnostic tests between the statisticians, the AI and medical communities. How do you think that they will respond to this?


This is definitely a big improvement over the way it’s taught until now. I think the students would be even better served by teaching them probabilistic diagnosis through logistic regression. This handles pre-test probabilities, non-binary diagnoses, continuous test outputs, and is consistent with the typical type of study used to build diagnostic models—cohort studies. [Sens and spec are only consistent with case-control studies.]. This chapter covers this: Biostatistics for Biomedical Research - 19  Diagnosis. Bayes’ rule is used in all medical school diagnostic curricula but this is the wrong time to introduce Baye’s rule if you already have a cohort study.