Diagnostic accuracy for a nested case-control study

I want to assess the predictive ability of biomarkers in a nested case-control study. The primary analysis with use conditional regression. I’d some questions:

  1. Can I determine the AUC, sensitivity and specificity for the biomarkers? I’ve seen it done before using standard logistic regression adjusted for the matching variables to generate the probabilities.

  2. I would like to examine prediction of an outcome which wasn’t used for the matching using the full population, should this be done using conditional or standard logistic regression? My understanding is that subgroups should use standard logistic regression.

The AUC, sensitivity, and specificity are based on the distribution of the test results in those with and those without the target condition. If the target condition is rare (as in population screening) sampling strategies that select all those with the target condition and a subset of those without are more efficient.

You refer to predictive biomarkers, so I assume this deals with a future event, or response to therapy, at some point in the future. There also, a nested case-control strategy can be more efficient that analyzing every cohort participant (see Margaret Pepe’s PRoBE design, for example).

What worries me a bit is the reference to matching and to logistic regression.

As soon as you start to include matching variables, the distribution of test results in your controls will no longer reflect the distribution of all controls (those without the target event) in the cohort, and any ROC analysis has to account for that.

A nice explanation of matching in accuracy research can be found in Holly Janes’ 2008 Biometrics article: https://doi.org/10.1111/j.1541-0420.2007.00823.x.

From your second question, it seems you are more interested in generating a multivariable prediction model, less so in the performance of (single) biomarkers. Is that correct?

Hi Patrick,
Thanks very much for the reply. I’m a researcher with a medical background so I wanted to double check the method in the link you provided. Janes recommends the use of adjusted ROC, therefore for a continuous predictor I would use a logistic regression model adjusted for the matching co-variates? The adjusted ROC would be based on this model’s probabilities?
Do you think that sensitivities derived from an “adjusted ROC” could be relied upon to reflect that values that would be seen in the full cohort? I’m wondering if I should consider the adjusted ROC a “low value” or exploratory finding.

I’m interested in the performance of single novel biomarkers (15 in total). Do each of the biomarkers improve prediction above existing risk factors and by how much. I think conditional regression is the best method to do this.