Problems with NNT

Couldn’t say it better.

I wonder what everyone here thinks of attempts to come at this issue from an MCDA lense. I’m currently working on a paper with a colleague to take the relative effects of a trial/meta-analysis (with uncertainty) combined with an assumed baseline risk to generate absolute probabilities. These are then weighted by importance to create a loss function which is used to rank treatments.

We’re planning to package the whole thing into a little shiny app or something similar. It’s essentially a simplified version of the approach proposed here: https://www.researchgate.net/publication/51923322_Multicriteria_benefit-risk_assessment_using_network_meta-analysis

Lots of issues/assumptions (e.g. joint distribution of outcomes is product of marginals), but I wonder if it might provide a bridge to make it easier to a) work directly with probabilities, and b) frame outcomes in terms of how much they matter to individual patients/clinicians/guideline groups.

The math for a single outcome fis pretty easy and is detailed on some of my blog posts and in BBR. But I don’t think it’s up to any of us to define the loss function to optimize. This should be up to the patient in consultation with the physician, often on a case-by-case basis. At some point we need a new topic to discuss simultaneous consideration of multiple outcomes - a very important topic - thanks for mentioning it. We also need topics to discuss patient utility elicitation and analysis.

1 Like

Given that as a patient I care more about how long I have good health for, then what age I live to, many of these endpoints are not that meaningful to me.

NNT also does not take into account people being sensible and stopping taking tablets when they feel they are getting no benefits from them, unlike clinical trials where everyone keeps taking the tablets because they’re paid to do so…

1 Like

Dear Friends, I have recently been reviewing the trials (with use of antidiabetic) that evaluated end-points cardiovascular events. Among the 19 trials are very divergent nnt when considering the time of treatment in which the benefit occurred. Is there any method that allows us to adjust the NNT / Time of treatment, so that they are comparable? Is there any advantage in calculating nnt / nnh?

2 Likes

i would definitely consider an alternative estimate. maybe consider steve nissen’s famous meta-analysis of cv events for avandia?: https://www.nejm.org/doi/full/10.1056/nejmoa072761

edit: regarding adjustment for time on treatment, maybe this other post is relevant: Is it reasonable to adjust for a post-baseline covariate?

One of the key problems with NNT is that it does not transport to other patient populations and there is no way to adjust it. That’s because we are trying to think of it as a number when in fact it is a function of all the patient characteristics that modify absolute risk. Showing the entire absolute risk function, or absolute risk reduction function, as I did in one of the referenced blog articles, is to me the way to go.

2 Likes

There is another problem with the NNT. It is the reciprocal of the risk difference RD defined as the difference between two proportions. Both obviously should be reported with confidence limits. In the non-significant case, the CLs for the NNT are absurd. Suppose we wish to compare the success rates on two treatments, 47 out of 94 (50%) and 30 out of 75 (40%). The estimated difference here is +0.1000. The 95% Wald interval is -0.0500 to +0.2500. Naively inverting these figures results in an NNT of +10, with 95% interval -20 to +4.

Note here that the calculated interval actually excludes the point estimate, +10. Conversely, it includes impossible values between -1 and +1, including zero. The explanation is that the calculated limits, -20 and +4, are correct. But the interval that these limits encompass must not be regarded as a confidence interval for the NNT. In fact, it comprises NNT values that are NOT consistent with the observed data. The correct confidence region is not a finite interval, but consists of two separate intervals extending to infinity, namely from -infinity to -20 and from +4 to +infinity. In Buzz Lightyear’s immortal words - To infinity and beyond! While a confidence region of this nature is perfectly comprehensible to a mathematician, one would not expect clinicians to be comfortable with it as an expression of sampling uncertainty.

5 Likes

I’ve not seen that angle discussed before. Very interesting. Would be useful to repeat with better (and properly asymmetric) Wilson confidence intervals.

You know, such good ideas come out of these discussions that maybe people should collaborate around them and produce publications. How unique would that be for an online community!

2 Likes

I wholeheartedly agree that intervals for p1 - p2 obtained from Wilson intervals for p1 and p2 combined by squaring and adding (now known as MOVER - Method of Variance Estimates Recovery) are greatly preferable to Wald intervals - see Newcombe RG. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine 1998, 17, 873-890. Nevertheless, that is a quite separate issue. This issue with the NNT happens practically regardless of what method we choose to calculate the interval for p1 - p2. The Wald 95% interval for 47/94 minus 30/75, which is also used in my book, is simply the perfect example to illustrate this point as it happens to give these round figures for p1 - p2 and both lower and upper limits to 4 dp.

1 Like

The confidence interval anomaly arises because the NNT is a reciprocal of a quantity (the risk difference) whose value can cross zero. A similar thing happens with Fieler’s Theorem. I seem to recall Andy Grieve writing about this many years ago. I see little use for NNTs anyway so I have never got very excited about this.

2 Likes

I going to offer a partial defense of NNT.

NNT often acts as an “infographic” to quickly visualize comparative effectiveness of different interventions. Physicians can quickly see magnitude of benefits and harms of different interventions for a single patient, or magnitude of benefits and harms for a single intervention for different patients with different baseline risks. This is helpful to the average physician! For example, take this figure from a 2005 buisiness week article:

Despite all its limitations, the NNT still works well in certain scenarios. How often do the listed limitations apply?

1) Loses frame of reference: most event rates are low; interventions with ARR 0.93-0.92 are rare. I would argue this as “not often”.
2) Group Data to individual: this is an issue with all absolute values. This limitation always applies, but is also unavoidable.
3) Group Decisions: NNT is a decision support. Their is always a fear that decision supports will inflict the tyranny of averages without consideration, but in practice patient care is more nuanced. I would argue this as “not often”.
4) NNT has great uncertainty: this is a valid point, the NNT is rarely presented in a range. This is sort of like a design choice to keep “visualization” simple. I would say this limitation “often applies”, but the option to present a range is always possible.
5) NNT is poor choice for shared decision making: This is tricky. As a value question, it depends on context, patient education, alternative communication options, use of visuals, etc. I am uncertain of this as a limitation.
6) NNT omits timing: This is a valid point. The NNT is rarely presented with the length of study time. This is less of an issue for acute problems or survival numbers, and becomes a bigger problem for chronic conditions. I would say this limitation “depends”. For Emergency Medicine (acute problems) its “not often”. For Primary Care, its “very often”. The option to present NNT with time-range is always possible, but also makes comparing interventions more difficult.

Out of all of these limitations, #2, #4, #6 are probably the most commonly encountered.

If we want to remove NNT, we need an equivalent “infographic” replacement that serves the same function (provides quick and easy way to interpret comparisons). Certainly, if an alternative can improve on the big limitations from above, we may have an argument against using NNT. Until then, I suspect the NNT will continue to be used. For most users it is still a helpful clinical decision aid.

The graphic would be incredibly more useful had the average risk for untreated individuals been side by side with the average risk for treated individuals. Next to that put thermometers to indicate absolute risk difference.

This is not an issue with absolute values. You can estimate the risk for an individual patient. This problem is very avoidable.

NNT is not made for decision support. It does not apply to individuals. Absolute risk estimates (separate for untreated and treated patients) are much more directly applicable to decision support.

This is related to the point above. Shared decision making needs to consider absolute risks (untreated, treated) or life expectancies (untreated, treated). The fact that some hypothetical group of “others” is out there is not relevant at the decision point.

I’d like us all work together on an optimum visualization that provides the data that are fed directly into decisions. In a second phase, a graphic that shows how to apply the first step to patient utilities to recommend the optimum decision would be another good goal IMHO.

1 Like

As an informaticist, I think this would make an interesting project (i.e. a web-based interactive graphic). It could also be useful for visual abstracts, etc.

Presumably it would include ranged inputs for [baseline risks] [relative risk] and output [absolute risk].

Does anyone have suggestions or mock-ups for how it could look?

Thanks.

1 Like

For a general audience I’d suggest sticking to only absolute risk. Note that a simple graph can be used to convert odds or hazards ratios and baseline risk to risk differences, as in this. Instead of that I propose that we find a way to present the two absolute risks, make the time frame apparent, and show the risk difference. To help in the communication it would be good to add some points of reference in separate rows of the display, e.g. risk of a 50 year old from the general population dying within 10 years whether or not he drives 100,000 miles. Relative treatment effects could be shown as footnotes.

For the decision analysis part, start with this paper.

David Spiegelhalter has written a lot about risk communication.

:new: On further reflection I think your idea of making this interactive is a good one, because RCT reports only include crude average treatment effects on an absolute scale. Fortunately they provide relative effects for binary outcomes (these really need to be covariate adjusted; crude odds ratios will underestimate the conditional odds ratios). So the interactive component could ask for a patient’s baseline risk, showing as an example the average risk for patients enrolled in the trial, and compute the absolute risk with and without treatment.

Confidence limits for the published relative efficacy could also be factored in, resulting in a confidence interval for the predicted absolute risk for a treated patient, assuming no uncertainty in the baseline risk estimate.

1 Like

@f2harrell I have a conference presentation coming up where I used shinydashboard and highcharts to do a quick and dirty version of this idea. Would love to hear your thoughts (and wish I thought to ask you sooner!). Basically used the OR outputs of a network meta-analysis to allow clinicians to enter local event rates + preferences as a decision aid. Plenty of limitations (some of those credible intervals are clearly not credible) but it’s an idea I hope to develop.

2 Likes

Wow Tim that is really cool.

I’d like to see a streamlined app that dealt with results of a single trial and with one clinical outcome.

3 Likes

Super cool @timdisher! I’d add to @f2harrell, that it would be really nice to have a general infrastructure which could be reused across individual trials and outcomes.

2 Likes

Thanks @venkmurthy and @f2harrell. Creating just the absolute probability plots with sliders for assumed baseline rates of a single outcome would be amazingly simple on a trial by trial basis but more difficult to have be general since I currently use mcmc draws to account for both baseline rate and model parameter uncertainty (while respective correlation between coefficients in the latter). For this to work and be correct/not have to make distributional assumptions about the posterior people would need to upload their mcmc matrix output.

@f2harrell In economic models we often will use cov matrix of the model if frequentist to draw from a pseudo posterior to assess parameter uncertainty but the purest in me hates this for obvious reasons. Can you think of a better solution?

2 Likes