I am currently trying to expand my statistical knowledge in the field of “biomarker evaluation”.
In my setting “biomarkers” are more or less measured values from blood samples, meaning values such as troponin, BNP (brain natriuretic peptide), creatine kinase.
I have already understood well why the comparison of AUC’s (c-statistics) is problematic (low power), even though this is still done in 70% of the literature.
At other, superior possibilities of comparative measures I have already looked at here. This was really insightful and the most helpful source, can’t thank enough for that @f2harrell!!
I am currently finding it very difficult to find good literature to develop this understanding further.
In particular, I find it difficult to articulate the problematic nature of the AUC comparison to my colleagues beyond the purely mathematical factual “low power” argument. I found the papers by Lobo et al. and Halligen et al. helpful. But especially the second one aims in a certain way towards NRI or net benefit. Which, if I understood it correctly, again have the problems of dichotomization or backward probabilities. (If someone can recommend literature discussing this in more detail, I would be very interested )
To be honest, although I feel like I understand the problem with AUC, ROCs, sensitivity, specificity and so on to a good extent, it is very difficult to argue towards colleagues and especially senior researchers why those problems really matter. To put it bluntly, the opinion (even if well evidenced) of a young PhD student is not as weighty as, for example, the remarks at an EMA presentation where the AUC is described as “an ideal measure of performance because it is cut-off independent”.
Therefore, “guidelines” for the “validation” of biomarkers would be very helpful to me.
A very good example where the argumentation struggle begins is the chain of argumentation: “If there is an increase in c-statistic one have evidence for added predictive information, but if c-statistic does not increase one do not have evidence for no added value (c-statistic is insensitive).”
I have the impression that I do not get this argumentation explained tangibly enough to my clinical colleagues…
Can anyone recommend good and further literature on this “subject”?
Thanks already for everyone trying to support and help me