On Twitter, @f2harrell referenced an Andrew Gelman post on this research synthesis of statins: Evidence-based medicine eats itself in real time (link)
Such a provocative title got me interested. Fortunately, Gelman has a link to the original paper. A quick skim shows they violated most of the statistical guidelines in this thread:
What they did:
No attempt at meta-analysis.
Because our systematic review involved 3 different drug classes and several different patient populations, we intentionally did not perform a meta-analysis.
That is unfortunate because they had enough studies (35) to do some Bayes/Empirical Bayes modelling, where information from heterogeneous groups can be used to draw conclusions about others.
Categorization of continuous variables. This would be another example of “Threshold Science” that @llynn has posted on numerous occasions. They probably didn’t have much choice due to what was reported in the primary studies.
Absence of Evidence Fallacy: Mortality and Cardiovascular Disease benefits were classified as “no” if CI includes RR, OR, or HR of 1. A quick scan of the CI in Table 1 shows either very wide (uninformative) intervals, or intervals that are skewed towards benefit, and no attempt to transform them to a common scale.
They did a ‘vote count’ that treated "significant’ results as “positive” and “not significant” results as “negative.” Such a vote count procedure was proven to be incorrect in 1986 by Hedges and Olkin the classic text Statistical Methods for Meta-Analysis (page 3).
Intuitively, if large proportion of studies obtain statistically significant results, then this should be evidence the effect is nonzero. Conversely, if few studies find statistically significant results, the combined evidence would seem to be weak. Contrary to intuition … studies of the properties of [improper] vote counting procedures can be shown to be strongly biased towards the conclusion that the treatment has no effect.
It will be interesting for me to explore the data in this paper and post some ideas on how to plausibly examine it, both from a frequentist and Bayesian POV.
Yes, a mix of OR, HR and RR with point estimates and confidence intervals listed. They are all RCTs and they should have extracted event_treat noevent_treat event_ctrl noevent_ctrl and recomputed the OR. The comparator was placebo or usual care so what “usual care” means needs clarity. Table 1 has two CKD populations but the rest are more or less conceptually similar. Once comparators are determined to be similar and CKD dropped, a frequentist MA should also be informative.
My first thought was “why not compute OR for each study?” I was wondering if the data could also be used in an Empirical Bayes way to derive regression/prediction models discussed in that OR vs RR thread. It seems like they have enough studies to do some advanced synthesis techniques.
I’m going to have to dig out my Efron book on Empirical Bayes and think about this.
is that they did extra dichotomisation in classifying trials as “yes” or “no” for achieving the LCL-C cholesterol target of a reduction of 30%. They aren’t clear, but I think this classification was based on the average reduction found in the trial. So in Table 1, the first trial is classified as not achieving this target as it had an average reduction of 26%. Pretty obviously, if the average reduction was 26%, quite a lot of people would have achieved the target of 30%. (of course, there probably isn’t anything magic about 30%, that’s just a threshold that a some people decided on - but that’s another argument). So there is a huge ecological fallacy here. There are two trials in Table 1, one with a reduction of 29% and the other with a reduction of 31% - one is classified as meeting the target and the other is not, but the distributions of % reduction may have been very similar. This makes no sense.
The classification of results as showing benefit or not was pretty mad too: for example, there’s a trial with RR 0.78; 95%CI 0.60 to 1.00 that is classified as not showing mortality benefit (and there are several similar examples).
The paper says it was peer reviewed - but the journal doesn’t seem to have open peer review so I couldn’t see the reviewers’ reports. Probably just as well - there seems to be a major system failure here, but it might be instructive to understand how it occurred.
Before going too far down this (very deep) rabbit hole, it’s probably useful to step back and consider the context in which articles about lipid medications are often written. Some useful background:
There are many loud voices in the field of cardiovascular risk reduction. Authors (and certain journals) discussing medications can be very invested, for many reasons, in summarizing the evidence a certain way. The stakes are very high-potential public health impacts are huge.
It was almost certainly reviewed by other physicians only, because the analysis problems jump out at anyone who has studied the literature on statistical methods for meta-analysis.
After reading a number of @Sander papers as well as following up on his references, I would agree with him that statistics needs to be taught from an information theoretic perspective.
When this is done, all of the recommendations provided by @f2harrell in his numerous posts, BBR, and RMS, make sense.
A large part of the reason there are “loud voices” is because financial risk is being placed on U.S. health care organizations, whether they like it or not.
The key modification:
Value-based arrangements where the value-based enterprise assumes substantial downside financial risk. This safe harbor covers both monetary and in-kind remuneration exchanges between a VBE and a VBE participant in a VBE that assumes substantial downside financial risk from a payer if the VBE participant assumes a meaningful share of the risk. This safe harbor offers greater flexibility than the care coordination arrangements safe harbor. The OIG finalized this safe harbor with industry-friend modifications. For one of the four different payment methodologies used to determine “substantial downside financial risk,” the OIG reduced the risk threshold (i.e., the VBE has repayment obligation of 30% of shared losses rather than 30%). While the proposed rule defined “meaningful share of the risk” to mean at least 8%, the OIG reduced this amount, requiring the VBE participant to share at least 5% of the financial risk to qualify.
As more U.S. physicians work as employees of VBE, with financial management being delegated to non-clinicians, we are placing primary care physicians in the position of making complicated actuarial calculations with little supporting infrastructure.
The non-clinical financial managers, in turn, have little incentive to implement appropriate decision support structures when a MA like this gets “peer reviewed” and published. Any failure in the financial performance of the organization will be suffered by those on the front lines. This is a prime reason for physician burnout in the United States.
It was my motivation for creating this thread on clinical practice guidelines. I can easily imagine cases where clinical practice guidelines would discourage even the most motivated clinician from providing the appropriate individual treatment, due to a misunderstanding by non-clinical managers on proper interpretation of clinical guidelines.
A vast number of retrospective meta-analyses (especially in my field of rehabilitation) attempt to use methods of synthesis when the information collected cannot support it – ie. there is too much entropy in the channel.
As was shown in the classic paper by J.L Kelly, there is a close relationship between information compression/synthesis, and the valuation of courses of action (ie. decisions). Appropriate methods of analysis, made explicit to all stakeholders, can simultaneously support clinicians in their job to aid patients making the best individual decision, while also helping the institution better manage financial risk.
Regarding the article that’s the subject of this thread- I don’t think you need to worry too much about doctors or payers being influenced by it. While there are valid differences of opinion regarding some key aspects of lipid management (e.g., whether a “treat-to-target” vs risk-based approach is best), the opinions of the authors are not mainstream- they are fringe.
Is anyone interested in commenting on Figure 3 of the paper that is the topic of this thread. It plots absolute risk reduction (y) by % reduction in LDL-C (x) for the 19 trials that reported a benefit of the lipid-lowering medication studied (appears to be defined as HR/RR/OR CI excluded 1.00). The figure also shows what seems to be a regression equation (y = -0.0111x + 2.7355) along with a correlation coefficient (R=-0.087).
Paper is horrendous but is there any information in Figure 3?
I think Stuart Pocock answered your question in this comment (see link posted by @simongates) :
“Their second deception is to plot each trial’s LDL-C reduction against their absolute risk reduction of CV events, claiming that the association is too weak to matter. Such meta-regression techniques are well known to be flawed especially if they ignore the markedly different patient populations across trials. In this case the absolute risk reduction in each trial depends heavily on whether its patient population is high-risk or low-risk. So is their plot meaningful? The answer is NO.”