When Should Clinicians Act on Non–Statistically Significant Results From Clinical Trials?

This editorial was recently published in JAMA: When Should Clinicians Act on Non–Statistically Significant Results From Clinical Trials?

The editorial is clearly well intentioned and written by a team of well respected clinical trialists in intensive care medicine. However, it makes a number of basic errors (in my opinion), including: absence of evidence is not evidence of absence and a misrepresentation of point estimates. They also raise some important points, for example, taking into consideration financial considerations of treatment options, and viewing trial results in the context of what came before.

Overall, I suspect the editorial will encourage a number of studies to come out in the next year proclaiming: “X was numerically higher than Y” or “the point effect for X was higher than Y” and so on. While p values certainly have their problems, this appears to be a piece that encourages us to interpret trials as we see fit, will little regard for objectivity. Almost as if we should throw the baby out with the bathwater.

In my view, understanding uncertainly is going to be essential to the next 10 years of evidence based medicine. It’s going to require a mammoth educational and cultural shift in medicine. Given the popularity and traction of this article, it feels like it will set us back, and will be used as a reference for ongoing misrepresentation.

I would be really interested to hear if my concerns here are valid and shared by the community? Or perhaps I’m totally way off base. Either way, I thought I could nail my colours to the mast and invite course correction as necessary!

2 Likes

This is tough because front line clinicians are between being answerable to an individual patient (with a unique utility function) as well as answerable to third parties (payers, the government, etc.) who will emphasize population decision criteria.

Your question reminded me of this article by McShane and Gal [1]. In their paper, they used it as an assessment of evidential reasoning for people before and after an intro stats class, as well as those who teach statistics.

I’d say that in an individual context where there is little left to lose by trying speculative interventions, this would likely appease both the individual patient while not running afoul of the regulators.

I’d think in a policy context, the better answer is to collect more data.

McShane, B; Gal, D (2015). Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence. Management Science Vol 62, No. 6 https://pubsonline.informs.org/doi/abs/10.1287/mnsc.2015.2212

As a practicing surgical oncologist, I had a couple of thoughts on the piece:

  1. I think the title is click-baity and oversimplifies the issues the authors actually discuss.
  2. I agree with the general ideas presented including not just zeroing in on efficacy, but considering costs, implementation aspects of therapy, and the idea of what threshold of evidence should be used to adopt new treatments, technologies, or deimplement them. My field in particular is rife with therapies approved based on marginal effects, surrogate endpoints that don’t help our patients, and not taking any regard of what the costs of the treatments will be. That said, we’ve also had a few notable successes in the last decade in de-implementation on the surgical side due to high quality RCTs, so it is possible…
  3. I 100% agree with @Doc_Ed about the need to massively overhaul how we educate the incoming physicians on levels of evidence, data interpretation, and basic statistical concepts. Personally, I did not read this particular piece as a carte blanche to interpretation of clinical trials as we see fit. I suspect that is happening to some degree already with many people not taking evidence objectively, but more often to support their own internal biases as they see fit. I do think that appropriate education is where we can make the most difference. I am not sure whether this article drives things one way or the other, but then again, I have never put that much stock in Editorials anyway…

I would be interested to hear people’s thoughts on the Editorial and how others think we need to move forward with more appropriate education in our medical schools or even before…

2 Likes

I had the same thoughts after reading this article. My impression might stem from a lack of expertise, but it feels like several complicated concepts were a bit blurred here (?) including: 1) how to properly interpret and clinically apply a confidence interval that includes the null, particularly in situations where previous evidence is limited, and 2) the potential inferential pitfalls of dichotomizing a study’s results as “statistically significant” or not.

For example, the confidence interval from the COACT trial is wide and crosses the null. Although the point estimate narrowly favours delayed over immediate angiography, wouldn’t the best interpretation of this trial, in the absence of any prior evidence favouring delayed angiography, and in view of the wide confidence interval, be that “ more evidence is needed” (e.g., from larger trials)? Instead, the authors suggest that the interpretation should be: “since the point estimate from this study looks favourable for delayed angiography and the confidence interval crosses the null, I’m probably ok to adopt the more cost-effective delayed approach since I’m probably not going to do any harm.” Maybe I’m wrong, but this interpretation seems like a classic example of the absence of evidence fallacy and could potentially be harmful if it were to discourage conduct of a larger study where more outcomes of interest could be observed (?)

Re the potential pitfalls of “dichotomizing” a trial’s result as statistically significant or not- isn’t it only relevant to discuss this issue for trials where a large number of outcomes of interest were observed? As I understand it, the concern about using the term “statistically significant” is that 1) it is an aribitrary cut-point, and 2) it could lead to premature discarding of potentially useful treatments. But in situations where we don’t have any prior evidence to suggest that one treatment might be superior to another, and where a trial comes along in which few outcomes of interest are observed (often the case with small trials), wouldn’t any corresponding error interval for the trial be such an unreliable indicator of the ballpark true (but unknowable) effect that the question of whether or not the error interval from this single study crosses the null (i.e., whether the result is “statistically significant”) becomes irrelevant? In contrast, in a trial where a large number of outcomes of interest was observed (which often, but not always, requires a larger sample size), then over many hypothetical repetitions of the same experiment with the same sample size, we might expect that the point estimate and error interval would be reasonably stable (?) In this case, it might be reasonable to infer, from our single large experiment, that our result will be fairly representative of the true (but unknowable) population effect and only at this point (once we’ve agreed that the error interval would likely be fairly stable in the long run), would it become relevant to discuss the pitfalls of dichotomizing the result (?)

1 Like

Thanks for the insightful responses. @ESMD draws out my concerns about an incorrect interpretation of point estimates and confidence intervals. Thanks for the link @R_cubed most enjoyable and informative.

Building on those themes the article perhaps just falls a bit short of target. It would have been great had this platform been used to discuss the area more robustly, and dare I say, make a stronger case for bayesian methods. Obviously not a panacea, but most certainly helps to address many of the issues raised (It was in there, but a pretty light touch).

Perhaps the article over all is a good thing, and some inaccuracies are irrelevant as its all about starting the conversation - which this no doubt will.

Thank you for the thought provoking discussion.

Is anyone else having trouble accessing the piece now?

I saw it earlier and thought the intention might be good to hopefully limit how many nonstatisticians see p>.05 or CI with the “null” value and conclude “no difference.” I think the piece fails in that it should at least talk about what p-values can and can’t tell us so then it’s more clear why nonsignificant tests are indicative of much (then this shouldn’t be such a big shock).
I didn’t read enough to see the reports on misinterpreted CI but I don’t doubt this.

1 Like

It’s still there ok I think.

Worked for me now, sorry for the long delay.