I posted a question to cross validated that might be a better fit for datamethods. I’m planning a study that will evaluate a new method of identifying cases in need of treatment. Let’s say we’ll evaluate 100 people with the new method, and the sample consists of 35 true cases and 65 non-cases as judged by an existing gold standard.[1]

In practice, I might want to argue that the new method is only worth adopting if a metric like sensitivity (or specificity or accuracy) is greater than some threshold, e.g., 0.7.

If I ran the study and had 15 false positives and 10 false negatives, the point estimate of sensitivity would be 0.71 with a 95% CI of 0.56-0.86. So maybe I could say, “Under repeated sampling, the true value will be between 0.56 and 0.86 95% of the time”, but I’m wondering if there is a method to make a different statement.

*Given a point estimate of 0.71, I want to say that there is an X% probability that the sensitivity of the new method is at least 0.70 (the actual threshold will be set higher, but 0.70 seemed to work for this example). Is this possible?*

[1] The first part of the study looks a bit like a case-control design on the surface. The new method will be used to identify 50 cases and 50 non-cases. We will then check the “true” status of all 100 by assessing each with an existing gold standard. So we’ll end up with confusion matrix of correct and incorrect labels. The sample for assessing sensitivity and specificity will not be a random sample, but we have other data on the prevalence of cases (10% defined by the gold standard).