Decision analysis calculations

(First, if the topic and tags are not appropriate, please let me know and I will modify them)

I am reading an article that uses decision analysis and I am trying to understand the numerical calculations in the table. But I have spent a long time and still can’t figure it out. If someone can take the time to help me, I would be very grateful.

This is the paper I am reading.
https://www.bmj.com/content/344/bmj.e4181.long

The authors used decision analysis. In Table 5, the authors listed some numbers.

The authors said in the last paragraph of the Results section that

…people aged between 35 and 74 years. At the traditional threshold of 20% used to designate an individual at high risk of developing cardiovascular disease, the net benefit of QRISK2-2011 for men is that the model identified five more cases per 1000 without increasing the number treated unnecessarily when compared with the NICE Framingham equation.

I try to calculate the net benefit here for 5 people. But I don’t understand how to calculate it. Does anyone know how to calculate it? Thank you vey much.

1 Like

Hi Jiaqi

I have read the paper and have the same concern as you. The text doesn’t correspond to the numbers in the table…So either both of us are missing something or one or both of the text or the table contains error(s) (?)

I understand why you have flagged this, as the implication would seem to be…important…More eyes on this would be good.

1 Like

I cannot identify any data in Table 5 that would support a conclusion that in men aged 35 to 74 years something is better by 5 for the QRISK2-2011 model compared with the NICE Framingham model.

I believe that the last paragraph of the Results section refers to analyses that estimated net benefit. Data on net benefit does not appear to be presented in Table 5.

Figure 3 is stated to display:

"the net benefit curves for QRISK2-2011, QRISK2-2008, and the NICE Framingham equation for people aged between 35 and 74 years.”

Earlier in the paper, the authors briefly describe how net benefit is estimated:

“Briefly, the net benefit of a model is the difference between the proportion of true positives and the proportion of false positives weighted by the odds of the selected threshold for high risk designation. At any given threshold, the model with the higher net benefit is the preferred model.”

The results of the net benefit analysis shown in Figure 3 for men aged 35 to 74 years are described as follows:

“At the traditional threshold of 20% used to designate an individual at high risk of developing cardiovascular disease, the net benefit of QRISK2-2011 for men is that the model identified five more cases per 1000 without increasing the number treated unnecessarily when compared with the NICE Framingham equation.”

The y-axis for Figure 3 is labeled net benefit. The x-axis is labeled threshold. The curves show net benefit for “treat all” and for Q-RISK-2008, QRISKS2-2011, and NICE Framingham for thresholds of 0% to 30%.

Examining Figure 3, I cannot understand the basis for the statement that the QRISK2-2011 model for men identified 5 more cases per 1000 without increasing the number treated unnecessarily.

More eyeballs on Figure 3 might help clarify how the number 5 was derived. Maybe the data on net benefit were calculated but not presented in either Table 5 or Figure 3?

The authors show comparisons of the QRISK2-2011 model with Framingham NICE that suggest the QRISK2-2011 model is preferred over Framingham NICE for several reasons other than better net benefit. The data about net benefit may not add much to the argument in favor of using QRISK2-2011 over Framingham NICE for the UK population and UK data.

Figure 3 sure confuses me!

1 Like

I might be getting the mathematics horribly wrong, but I don’t understand how the numbers in Table 3 map onto the net benefit in Figure 5.

Net benefit (relative to a treat none strategy) is =

\frac{TP}{N} - \frac{FP}{N} *Weight

The weight depends on the probability threshold for testing and is equal to \frac{threshold}{1 - threshold}. This weight (ratio) basically represents the utility of a true positive versus the disutility of a false positive).

So taking the numbers from Table 5 for a threshold of 20% in Men aged 35 to 74:

  • QRISK2-2011 would have: 110 declared as positive (high-risk), consisting of 18 true positives (CV events) and 92 false positives (no CV events)

  • NICE-Framingham would have: 206 declared as positive (high-risk), consisting of 27 true positives (CV events) and 179 false positives (no CV events)

  • The weighting factor to be used at a threshold of 20% is \frac{0.2}{1-0.2} = 0.25. Essentially, this is saying that the disutility of a false positive (designating someone as high risk even though they do not suffer an event) is a quarter of the utility of a true positive (designating someone as high risk who does go on to suffer an event).

Plugging the numbers into the above equations, we get -0.005 and -0.01775 respectively. Multiplying by 1,000 to get net benefit per 1,000 persons, we get -5 and -17.75 for QRISK-2011 and NICE-Framingham respectively.

This sort of goes against what Figure 3 shows (positive net benefit compared to a treat none strategy at a threshold of 20%).

My guess is I probably messed something up with the above calculations (or with respect to how Figure 3 relates to Table 5). Glad to be corrected if so.

**Another possibility is that the graphical software may have “smoothed” out the DCA, which would have the effect of making the graph not correspond exactly to the table.

1 Like

Maybe there’s simply an an error in the paper (?) Although the first sentence of the last paragraph of the Results section refers to “Figure 3,” Figure 3 does not seem to have any bearing on the following sentence, which seems, instead, to be referring to results we’d expect to see presented in Table 5.

My interpretation of Table 5:

Looking at men aged 35-74:

Of 1000 men in this age group, 54 would be expected to have a future cardiac event.

If we had used the QRISK2-2011 risk calculator to “risk-stratify” these 1000 men, 110 would have been classified as “high” risk (i.e., risk >=20%). But these 110 “high risk” men would only be expected to account for 18 of the 54 future cardiac events in the overall group of 1000 men. The remaining events (54-18=36) would occur among men NOT classified as “high” risk using this calculator.

In contrast, if we had used the NICE Framingham risk calculator to “risk-stratify” these 1000 men, 206 would have been classified as “high” risk (i.e., risk >=20%). And these “high risk” men would be expected to account for 27 of the 54 future cardiac events that would occur in the overall group of 1000 men.

So doesn’t this result favour the NICE risk stratification tool for this particular group, rather than the QRISK calculator (?)

1 Like

The one issue with this way of estimating net benefit is that it does not account for the false positives. Therefore, a tool can be made to look arbitrarily good by lowering the threshold for treatment.

For instance, using this strategy, classifying everyone as high-risk (i.e., the “treat all” strategy) would be perfect because it would successfully classify 54 out of 54 events as high risk. The problem is that this would come at the cost of incorrectly classifying 946 patients as high risk even though they suffer no event.

That’s where the 2nd part of the equation comes in (subtracting the weighted false positives).

1 Like

Yes, no argument from me on that point. Considering tradeoffs when deciding who to treat is very important. But the key point that needs to be explained is the disconnect between the numbers presented in Table 5 and the accompanying text. Specifically, the authors seem to be suggesting that the raw numbers in Table 5 “favour” the QRISK calculator for this age group- where “favour” is being used in the crudest possible sense- i.e., which “high risk” definition will capture a higher proportion of future cardiac events? However, this isn’t actually what the table actually shows…

1 Like

I think the authors are correct when they say that the QRISK offers net benefit compared to NICE in this age-group (where I think there might be a mistake is in how these results are mapped onto the figure and the magnitude of “5 per 1,000” reported in the text).

The reason the authors say that the QRISK is better than NICE-Framingham at a threshold of 20% in this group is basically as follows:

  • QRISK correctly identifies 18 cardiac events out of 54 as high risk (true positives) and 92 false positives
  • NICE identifies 27 cardiac events out of 54 as high risk (so 9 more true positives) and 179 false positives (so 87 more false positives).

How do we decide whether getting 9 more true positives is worth the 87 additional false positives? At a threshold of 20%, we’re essentially saying that the gain from 1 true positive is equal in value to the disutility of 4 false positives (20:80 = 1:4).

So we need to discount false positives by a factor of 4: 87/4 = 21.75

Then we simply subtract:

9 (additional true positives detected by NICE) - 21.75 (additional false positives, discounted by a factor 4) = -12.75

NICE is worse by 12.75 points than QRISK. The authors’ claim that QRISK outperforms NICE is justified, not by the proportion of future cardiac events captured, but by the fact that NICE has to suffer many more false positives for a meager gain in true positives.

3 Likes

On the reasoning in your most recent post, agree that QRISK offers “net benefit” compared to NICE in men in this age group AND I agree there might be a mistake in how these results are mapped onto Figure 3 and the “5 per 1,000” in the text. Table 5 does not present a clear estimate of net benefit at each of the threshold values.

Note also that the total number of estimated events in 1,000 men aged 30 to 85 in Table 5 is 50, whereas the total number of estimated events in 1,000 men aged 35 to 74 is 54. Can this be correct? Wouldn’t age alone make the total number of events in 1,000 men aged 30 to 85 larger than the total number of events in 1,000 men aged 35 to 74?

3 Likes

Sorry I just saw your reply. Thank you very much for getting back to me.@EpiMD5 @Ahmed_Sayed @ESMD

The article doesn’t clearly say that Table 5 corresponds to the data in Figure 3. However, in the Results section, the description of Table 5 appears under the subheading ‘Decision curve analysis,’ and the table itself begins with the ‘Treat all*’ values. So I believe Table 5 likely presents the data shown in Figure 3.

My calculation results are the same as @Ahmed_Sayed , which makes me seriously doubt whether I might have made a mistake somewhere. When I look at Figure 3, it seems that the net benefit is approximately equivalent to 5 people. However, the values in Table 5 are indeed quite odd. I don’t think the use of a smoothing function would result in such large differences. If the data in Table 5 is correct, then the conclusion will be completely opposite.

It appears that we are in agreement that there may be a problem with the data in Table 5. This reassures me that I likely did not make an error in my own calculations.

This does seem counterintuitive. But what’s even more strange is that, at a 0% threshold, the net benefit for the ‘treat all’ strategy should equal the positive rate (since the weight is 0/1). In other words, the red dashed line for men in Figure 3 intersects the y-axis at around 0.08, suggesting that the positive rate should be 80 out of 1000. However, Table 5 reports a value of about 50. Maybe the Figure/Table are not actually related? But in theory, they should be…

It is a bit unexpected (one would expect moving the upper limit from 74 to 85 to carry more events than moving the lower limit from 30 to 35). It may just be noise though.

I agree except to say that the conclusion wouldn’t be completely opposite (that is, the net benefit calculations would still favor QRISK over NICE-Framingham). What would be different, however, is that a “Treat none” strategy going by the numbers in the Table would appear superior to using either risk calculator (net benefit of 0 as opposed to negative net benefit with both calculators).

1 Like

It is on my bucket list to learn more about decision curve analysis, so I hope to reply to this thread at some point. I did want to note: decision curve analysis is different from decision analysis, although the two are related.

I was looking quickly at the initial decision curve analysis paper, and it looks like it was developed for uncertain disease status (e.g., there were patients with uncertain prostate cancer disease status) rather than for future risk, but maybe it has been generalized to future risk, which seems to be the topic of the linked paper. From a decision analysis standpoint, when I think of uncertain disease status, I think of disease status as the state S. Whereas the future risk pertains to the post-treatment state S', and it is assessed with P(S'|treatment,S).

1 Like

Yes, I expressed it incorrectly. The data in the manuscript showed that QRISK is better than NICE-Framingham, and I have no objection to this.

I am still learning. Decision curve analysis seems to be used to evaluate prediction models. Many people like to use c statistics, reclassification tables, etc. But these indicators have some flaws, which is why I am very interested in decision curve analysis.

The evaluation after treatment P(S`|treatment, S) is very interesting. My own understanding is that decision curve analysis provides a basis for whether to intervene (the benefits of intervention in screening), and then a randomized controlled trial can be conducted to evaluate the actual effect of the intervention (whether it effectively reduces mortality or other indicators)? I have too little experience, I don’t know if my idea is right, and I hope to get further comments.

I noticed that you mentioned research on relative sparsity in another post. I’m wondering if this issue is also related to your research. I briefly looked at it before, but my background in statistics is quite weak, so it’s a bit difficult for me. I’ll need to spend some time reading the paper. If you have more recommended materials, please let me know. Thank you!