Strategies to overcome referral \ workup bias in Diagnosis Prediction Models

I wonder what are some common strategies to overcome referral \ workup bias:

Some patients might not get the full diagnosis because of existing pre-test protocol and the outcome will be labaled as Non-Event (0) even though the underlying outcome exists (1).

This is a problem when analyzing data the indirect way using sensitivity and specificity. If you are on the other hand in forward-time predictive mode where you estimate probabilities of disease on the basis of current data, I don’t think you need to do anything special with, for example, logistic regression.

The current data does not indicate the underlying outcome for some cases and the probabilities will be biased regardless of the performance metrics.

If “David” has a true outcome 1 and a reported outcome 0 because he didn’t do the riskier and more accurate diagnosis we must adjust for it somehow.

To within the resolution of available data, you use pre-test patient characteristics to model risk, and the predictors in the model should include indicators of what makes a patient tend to not get the ultimate test. So based on best available evidence you’ve accounted for what needs to be accounted for. Unless I’m missing something.

If I understand correctly Ben van Calster talked about the same issue in his (very cool) lecture on enemies of reliable prediction models:

(The link directs to the relevant part of the lecture)

1 Like

Ben’s presentation is excellent. I was hoping to hear something about resistance to workup bias that results from having enough representation of patients of a certain type in the complete data. For example if females seldom go the final diagnosis but you have 100 females in your dataset who did, the model might be OK.

It is indeed an excelent presentation! I showed it to my team two days ago.
For me the most disturbing points are 8 for diagnosis (this thread) and point 9 for prognosis that leads me to counterfactual predictions.


I sent Ben email asking about the subject and he sent me some links to work done by Joris A H de Groot and a related tutorial in R:

Verification problems in diagnostic accuracy studies: consequences and solutions
Adjusting for differential-verification bias in diagnostic-accuracy studies: a Bayesian approach
Correcting for partial verification bias in diagnostic accuracy studies: A tutorial using R

I wonder if you have any thoughts about this work.

Some additional thoughts:

While I do agree that enormous efforts were dedicated to estimating the wrong performance metrics (Sens, Spec) I do not agree that there is no need for corrections in order to fix verification bias - not necessarily for model development but for model validation.

I use Lift / PPV conditional on PPCR (flexible resource constraint). If I’ll estimate Lift / PPV naively for PPCR = 0.05 (only 5% of the patients can be validated) I’ll get very different results since the top 5% at risk in the validation set are very different than the top 5% at the general population.

Propensity Score looks like a reasonable solution to me (once again, so much effort for Sens and Spec):

I don’t see a role for propensity scores here (or almost anywhere else). And if there are needed corrections for verification bias, it is better, and often possible, to include factors related to those corrections in the statistical model for diagnostic outcome.