Errors in teaching critical appraisal of null results (p>0.05) to clinicians

Raj · September 27, 2018, 8:27pm

An essential aspect of interpreting study results for p-value>0.05 is appropriately looking at confidence interval (CI) end-points. Unfortunately, the current evidence of practice suggests that many clinicians are failing to do this. As an experiment, I evaluated one of the most popular text books for teaching EBM to clinicians, the “JAMA User’s Guide to the Medical Literature, 3E” (https://jamaevidence.mhmedical.com/Book.aspx?bookId=847). I looked at 2 different chapters which used examples of study results with p-values>0.05. I then looked to see if CI were provided with each p-value, and if the CI endpoints supported the claims in the text. This is what I found:

Table%201

In the chapter titled “Confidence Intervals,” The five examples of study results with p>0.05 all reported CI that correctly supported the claims in the text. In contrast, most of the studies with p>0.05 in the chapter “Surprising Results of Randomized Trials” did not even include a CI. One study with p>0.05 provided CI that contradicted the claims made within the same sentence of text.

Example of RCT results presented without CI
example1
Clinical Question: In cardiac arrest patients, what is the effect of ACD CPR vs standard CPR on mortality?
Reference: https://www.ncbi.nlm.nih.gov/pubmed/8618367

Example of RCT result presented with CI that fail to support claims in text:
example2
Clinical Question: In patients with myocarditis, what is the effect of immunosupportive therapy on mortality?
Reference: https://www.ncbi.nlm.nih.gov/pubmed/7596370/

Consider this from the perspective of the student learner. If most of our examples do not show how to correctly interpret studies with p>0.05, then how is the learner expected to appropriately practice this skill when interpreting actual study results? For that matter, are we as teachers even interpreting these results appropriately? This is not meant as a criticism of any specific book or teaching guide. This is a difficult topic with many nuances and fine distinctions that are challenging to learn and teach. But the limitations within text books serve as a surrogate for limitations seen across the entire breath of how clinicians teach this topic.

There are many other errors that clinicians like myself tend to repeat over and over. I have some theories on why we clinicians tend to gravitate toward certain types of statistical mistakes (related to heuristics biases from day to day practice), but I will have to tackle that in another post.

I would appreciate any feedback, thoughts, or comments on similar educational issues when teaching EBM or critical appraisal to clinicians.
Thanks!

beespeev · October 10, 2018, 9:27am

Hi Raj,

Not sure how the example you point out is in error. Assume that the null is 1 for a risk ratio, then the confidence intervals are consistent with the p value reported.

Cheers,
Ben

Raj · October 11, 2018, 2:49pm

Perhaps “error” is not the best term for me to use. And I agree that the confidence interval are consistent with the reported p-value.

My point is that with a CI 0.52-1.87, we can not exclude potential benefit or harm. Its possible RR could be as low at 0.52, or as high as 1.87. These are clinically meaningful differences. Thus, it would be incorrect to state that the study results “did not differ significantly between the compared groups”.

Hypothetically, If the CI had been closer to 0.90-1.10, then I would have felt more confident stating no clinically significant difference.

pieter · March 27, 2019, 12:04pm

It the CI is [0.52 - 1.87], RR could have any value. It’s not true that values within a CI are more likely than outside the CI.

Edit: a 95% CI is simply (xbar ± 1.96*SE). So it contains only information about the sample.

Raj · March 27, 2019, 3:40pm

Agree with this comment.

For the purposes of medical decision making, we try (or are forced) to generalize results beyond context of study. Which is sort of not possible under initial assumptions for frequentist analysis. The short-hand solution is to change assumptions, and go bayesian (flat prior, treat as credible interval) or “bootstrap” (treat CI as possible future predictions). If there are actually any formal papers on this topic, I would love for any references.

List item

pieter · March 27, 2019, 5:04pm

If I see something I’ll share it !