Adjusting Positive Prediction for Baseline Prevalence

A while back, I wrote this cute little R function to take the ROC results for a prediction model someone else wrote and adjust them for the baseline prevalence you’d see in the population. Basically, their cross-validations has ~25% of the sample with the outcome but the real-world data has something like 0.1% with that outcome (queue eye roll).

I am interested to know if anyone sees any methodological issues with the function but also putting it out there in case others will find it helpful.

prev_adj <- function(prev, sens, spec){

  plr <- sens / (1-spec)
  pre_odds <- prev / (1-prev)

  post_odds <- plr*pre_odds
  post_prob <- round((post_odds / (post_odds+1))*100, 2)


Sensitivity and specificity since they condition on disease status are throught to be independent of prevalence. So this isn’t clear.

Minor note: what you have computed is not a probability. Probabilities are between 0 and 1.

1 Like

Thanks, Frank. I wasn’t clear. Yes, we were adjusting their “accuracy” metric for a pre-test probability, of which ‘prevalence’ was the surrogate.

The sensitivity and specificity are just used to calculate the positive likelihood ratio (‘plr’), since sens/spec is what we had access to from their model. While those individual metrics may be “independent” of prevalence, that was kind of the problem in our case. We didn’t want something that was “independent” of something that was a low-prevalence event but rather a metric which accounted for the fact that the modelers were trying to predict a low-prevalence event.

With this additional information in-hand, does the function seem correct?

Also, I get that probabilities are 0-1… but when explaining it to a super-lay audience, it’s easier to make it a 0-100 scale. So the point is taken and perhaps that multiplication could be removed from the code if the function is formalized.

1 Like