The AUC, sensitivity, and specificity are based on the distribution of the test results in those with and those without the target condition. If the target condition is rare (as in population screening) sampling strategies that select all those with the target condition and a subset of those without are more efficient.
You refer to predictive biomarkers, so I assume this deals with a future event, or response to therapy, at some point in the future. There also, a nested case-control strategy can be more efficient that analyzing every cohort participant (see Margaret Pepe’s PRoBE design, for example).
What worries me a bit is the reference to matching and to logistic regression.
As soon as you start to include matching variables, the distribution of test results in your controls will no longer reflect the distribution of all controls (those without the target event) in the cohort, and any ROC analysis has to account for that.
A nice explanation of matching in accuracy research can be found in Holly Janes’ 2008 Biometrics article: https://doi.org/10.1111/j.1541-0420.2007.00823.x.
From your second question, it seems you are more interested in generating a multivariable prediction model, less so in the performance of (single) biomarkers. Is that correct?