Minimum PPV/NPV values to reach positive net benefit and net benefit avoided

This may be obvious to some, but it hit me to realize the following insights regarding net benefit. I’d be happy to hear any thoughts or remarks regarding this, and whether you think this is correct/helpful.

When comparing a prediction model to a treat none strategy, we’re looking for a model that will provide NB>0, which will be achieved if TP>FP\times w. Since \text{ PPV}=\frac{TP}{TP+FP}, we’re actually aiming for a \text{ PPV}>\frac{FP\times w}{FP\times w + FP}, or \text{ PPV}>p_t.

When we compare a model performance to a treat all strategy, we can perform the same calculations and find that we’re looking for a model that provides us with \text{ NPV}>p_t.


Really cool intuition!

Could you clarify your notation? w is presumably the odds at the threshold probability and pt the threshold probability. I’m not sure I understand how this reframing is more intuitive than net benefit? Could you elaborate?

The notations are as you mentioned.
I find this useful when trying to frame the required performance metric when planning a project. It may be due to the fact the NB is less intuitive (to me) / less common.

1 Like

Net benefit is also very unintuitive for me, which is why your post interested me. I’m just not entirely grasping what the equations mean, are you saying that, when compared to a strategy of treating no patients, the net benefit of a predictive model is higher than 0, if and only if the positive predictive value of the model is higher than that of a model that categorizes all patients as negative, at the given threshold probability? Or does this mean that when the PPV is, say 30%, and most clinicians agree that you should treat at a threshold of 20% (pt), then the net benefit must be positive? The later is a direct interpretation of the equation PPV > pt, but would be very surprising to me.

I don’t think the notation captions the problem. NPV and NPV are being used for groups of patients. There are ill-defined for groups and should only be patient-covariate-specific.

The same holds true for Lift?

Are you against the story that performance metrics such as PPV and NPV are telling in the context of groups (which might encourage information loss unlike using absolute risks) ?

Because in terms of decision making one can use the suggested heuristic by @urigott and it will lead to the same decision as Net Benefit (conventional or interventions avoided).

Lift curves are much more sensible and don’t lose information. They treat risk as continuous and allow you to target the highest risk individuals in order to treat the smallest number and get the largest number of outcomes subject to a constraint on the number of observations.

NPV and PPV almost never apply. Things are never that binary.

Generally agree with Frank. in particular, if you ask “what is the PPV at a given cut-point?” that is just the average of the predicted probabilities. If your risk is 10%, the PPV is 10% and the NPV is 90%; the PPV for a cutpoint of 10% will be higher than 10% because it includes patients with higher predicted probabilities; similar considerations apply for NPV.

That said, Urigott’s insight is correct for a true binary diagnostic test (in comparison to a model). PPV must be higher than the threshold probability in order for a positive result to be helpful; 1 - NPV must be higher than the threshold probability in order for a negative result to be helpful


In a way they do, because Lift Curve do not account for absolute risks and the same is true for all common curves related to discrimination: ROC, Gains and Precision-Recall.

One might suggest that we don’t care about absolute risks while dealing with resource-constraint and not dealing with expected treatment-harm, in that case I would still like to have a time-to-event version of the Lift Curve. Maybe your version of the c-index might be useful?

x-axis: PPCR ( Stands for the given resource-constraint, Predicted-Positives / Total-Population )
y-axis: (Time-to-Event c-index for a given PPCR) / (Time-to-Event c-index for a Random Guess)

We can think about other performance metrics as long as the decision is identical and the result of the decision yields the same utility (that’s why I’m OK with using Gains Curve where the sensitivity is shown on the y axis or my suggestion of drawing ppv on the y axis - they are consistent with the Lift metric in terms of choosing which model to use).

How come?

for p_t = 0.15

\hat {p}(Model A) = [0.1, 0.1, 0.2, 0.3, 0.5 ]
y (Model A) = [0, 0, 1, 0, 1]
\hat{y} = [0, 0, 1, 1, 1]

PPV = \frac{2}{3}

If I understand correctly you refer to the enforced binary decisions assuming no treatment harm, these might look very different from one day to another when dealing resource constraint and that’s why flexibility is a key.

Why isn’t it true for a prediction model?

PPV and NPV only work when covariates (risk factors) don’t exist, i.e., when every patient on a given treatment has the same risk. Never happens.