The editor (Hindrik Mulder) is giving us a hard time with the response, refusing a “For Debate” paper and asking us to cut back to 1800 words
Thoughts? I think this response is critical to resolve some of the misconceptions around research waste and the role of epidemiology and biostatistics in clinical research
On page 3 of your manuscript, you wrote the following:
No one can deny the fact that meta-analyses are the highest level of evidence in evidence-based medicine…
This very point has been disputed in a number of threads. The philosophical literature is just too large to list.
The primary fallacy is the notion of a single meta-analysis as being definitive. For any set of n reports, there are 2^n possible meta-analyses, if each one is given a weight of 1 for include, or 0 for ignore. Any approach that wants to use continuous weights in the range [0-1] increases the number of possible analyses to R^n \rightarrow 2^{\aleph_0} ie. the cardinality of the continuum, which is uncountable. This is exaggerating the possible range of justifiable perspectives, but illustrates that very special circumstances are required to make any synthesis definitive.
This has lead some to apply the jackknife resampling method to study the variability of a meta-analysis.
Gee, T. (2005). Capturing study influence: the concept of ‘gravity’in meta-analysis. Aust Couns Res J, 1, 52-75. (PDF)
Gene Glass (who you mention) later had this to say on meta-analysis (in 2000):
Meta-analysis needs to be replaced by archives of raw data that permit the construction of complex data landscapes that depict the relationships among independent, dependent and mediating variables. We wish to be able to answer the question, “What is the response of males ages 5-8 to ritalin at these dosage levels on attention, acting out and academic achievement after one, three, six and twelve months of treatment?” … We can move toward this vision of useful synthesized archives of research now if we simply re-orient our ideas about what we are doing when we do research. We are not testing grand theories, rather we are charting dosage-response curves for technological interventions under a variety of circumstances. We are not informing colleagues that our straw-person null hypothesis has been rejected at the .01 level, rather we are sharing data collected and reported according to some commonly accepted protocols. We aren’t publishing “studies,” rather we are contributing to data archives.
Nelder made a brief comment on ‘meta-analysis’ in 1986 that is worth mentioning (p. 113):
Recently the term ‘meta-analysis’ has been introduced (Glass et al 1981, Hedges and Olkin 1985) to describe the combination of information from many studies. The use of this … rather pretentious term for a basic activity of science is a clear indication of how far some statisticians’ views of statistics have diverged from the basic procedures of science."
Nelder, J. A. (1986). Statistics, Science and Technology. Journal of the Royal Statistical Society. Series A (General), 149(2), 109–121. https://doi.org/10.2307/2981525
I have always been annoyed when the term “evidence” is dogmatically thrown around by professionals without any particular expertise in statistics, mathematics, or logic, when actual experts are much more nuanced.
Michael Evans wrote the following as the first sentence in the preface of his 2015 book Measuring Statistical Evidence Using Relative Belief
The concept of statistical evidence is somewhat elusive.
Richard Royall stated similar opinions in his 1997 text Statistical Evidence: a Likelihood Paradigm
…Standard statistical methods regularly lead to the misinterpretation of scientific studies. The errors are usually quantitative, when the evidence is judged to be stronger (or weaker) than it really is. But sometimes they are qualitative – sometimes one hypothesis is judged to be supported over another when the opposite is true. These misinterpretations are not a consequence of scientists misusing statistics. They reflect instead a critical defect in current theories of statistics.
Research waste is merely a consequence of ignoring decision theory as fundamental to scientific inference.
What a terrific thought. But then it goes a bit too far in suggesting that the data can provide a degree of specificity for which the sample size is actually inadequate. But the biggest reason to use rich data landscapes is to take into account known sources of outcome heterogeneity, i.e., to make estimates more precise and meaningful through covariate adjustment.
I think your proposal in this post is a useful starting point:
If studies started reporting the effect as a distribution, and the effect of omitting certain variables on that distribution, more useful syntheses based on modern interpretations of confidence and “fiducial” distributions could be applied. There are close relationships between Bayesian posteriors and frequentist distribution estimators that might bridge the gap among various schools of thought.
I like some of the recent work done by Bernardo and Berger from the objective Bayes perspective that links Bayes and freqentists via the notion of missing information. In any circumstance, we are always free to ask: “How much does the missing information (aka. Bayesian prior) matter?” I’d like to see an extension of the Matthews and Held Bayesian Analysis of Credibility extended to these estimators.
Bernardo, J. M. (2006). A Bayesian mathematical statistics primer. In Proceedings of the Seventh International Conference on Teaching Statistics. Salvador (Bahia): CD ROM. International Association for Statistical Education. (PDF)
I don’t think that there is much chance of research synthesis being removed from the hierarchy of evidence any time soon and most of the papers you have cited are discussing either:
a) Best use of the synthesis
OR
b) Incorrect and/or duplicated syntheses
There is not a single paper to date that dismisses syntheses as you have suggested in this thread. For example the BMJ-EBM paper you quoted does not remove the synthesis from the hierarchy of evidence but rather asks how it could be best used:
Yes there has been research waste when it comes to research synthesis but there is as much research waste with primary studies so this has nothing to do with these types of studies per se. The synthesis of existing research when properly done will certainly reduce research waste and The Lancet announced in 2005 that " From August, 2005, we will require authors of clinical trials submitted to The Lancet to include a clear summary of previous research findings, and to explain how their trial’s findings affect this summary. The relation between existing and new evidence should be illustrated by direct reference to an existing systematic review and meta-analysis. When a systematic review or meta-analysis does not exist, authors are encouraged to do their own" ”
I believe that when we begin to think that primary research is a means towards the conclusive answer to a scientific problem rather than a contribution towards the accumulation of evidence on the topic, then these types of discussions ensue.
I agree, and we need covariate adjusted estimates from RCTs to enable more robust evidence syntheses - something that is rarely reported.
The main focus in RCTs today when it comes to HTE seems to be on subgroup effects (rather than risk magnification which is more important), which in an individual RCT can only be suspected because the vast majority of such observations are actually due to artifacts of the sample and nothing more. The only way then to be sure is to see whether the modifier shows up consistently across studies in a synthesis of the evidence and thus only subgroup effects within syntheses may provide a better level of evidence for this sort of HTE
My complaint was about the dogmatic assertion regarding “hierarchies of evidence” that is clearly false, not about the relative merits of meta-analysis or primary research per se.
Finding work critical of EBM depends on which literature you look at. The BMJ article made it quite clear that “meta-analysis” isn’t an independent method, but a perspective from which to view research reports. Glass himself desired a replacement of “meta-analysis” with something that is now called “fusion learning” or “confidence distributions” in the frequentist stats literature. Nelder’s quote is compatible with the opinions of the BMJ authors and Gene Glass.
Complaints about research “waste” are a direct consequence of a flawed theory of evidence. A flawed theory of evidence will lead to:
ignoring information that should be conditioned upon, leading to studies that should not be done, or
conditioning on false information, causing surprise and controversy in practice, leading to more calls for additional research.
Citing one more thread in this forum alone feels like beating a dead horse, but there are a number of papers here that either discuss decision analysis in a medical context, or study EBM criteria empirically and find it flawed.
From the abstract:
The notion that evidence can be reliably or usefully placed in ‘hierarchies’ is illusory. Rather, decision makers need to exercise judgement about whether (and when) evidence gathered from experimental or observational sources is fit for purpose.
From the abstract:
As EBM became more influential, it was also hijacked to serve agendas different from what it originally aimed for. Influential randomized trials are largely done by and for the benefit of the industry. Meta-analyses and guidelines have become a factory, mostly also serving vested interests.
From the abstract:
The limited predictive validity of the EPC approach to GRADE seems to reflect a mismatch between expected and observed changes in treatment effects as bodies of evidence advance from insufficient to high SOE. In addition, many low or insufficient grades appear to be too strict.
“I have always been annoyed when the term “evidence” is dogmatically thrown around by professionals without any particular expertise in statistics, mathematics, or logic, when actual experts are much more nuanced.”
Physicians (presumably the “professionals without any particular expertise…” you refer to above) witness (daily) our patients being bilked out of their life savings by “healthcare providers” who harbour a complete disregard for the standards of evidence you seem to disdain so much. If I had a nickel for every patient I’ve seen who has spent (or is contemplating spending) hundreds or thousands of dollars on BS treatments being promoted in my community (e.g., platelet rich plasma injections, laser to every imaginable body part, steroid injection (of every imaginable body part), cupping, acupuncture, naturopathic treatments,….), I’d be a wealthy woman. Hawking unsupported, non-reimbursed (for good reason) therapies to desperate patients (some of whom can barely afford their groceries) who are in no position to be able to independently assess the validity of efficacy claims, is reprehensible. If my publicly-funded healthcare system has to choose between reimbursing the cost of SGLT-2 inhibitors for all my diabetic patients versus acupuncture for acute lumbar strain, guess where I’d prefer the money is spent?..
It seems important not to allow frustration with the difficulty of obtaining RCT evidence in one’s field to morph into a general resentment/disregard for RCT evidence (or the field of “EBM” or physicians). As noted in this link from another thread, the importance of randomization to demonstrate therapeutic efficacy was appreciated LONG before the “dawn of EBM” in the early 1990s:
Acquiring convincing evidence that something “works” is very hard- there’s no way around this fact. Tearing down a method, simply because it is sometimes out of reach, seems unscientific. And scouring the methods literature to find big names who have questioned the importance of randomization (most of whom probably don’t do work that has actual consequences for patients) probably isn’t the most productive way forward. The better way to boost the credibility of therapies offered by a field would probably be to find a way for it to obtain randomized evidence.
At the end of the day, physicians, even though most are “without any particular expertise in statistics, mathematics,…” (“or logic;” really??), arguably have an awful lot more “skin in the game” when making treatment decisions than do the shysters I’ve described above.
We probably have different ideas about what is EBM. In my view EBM is the use of the research literature by physicians as the basis for decision-making in Medicine and therefore requires physicians to have a clear understanding of the science behind clinical research (known as clinical epidemiology which subsumes clinical biostatistics). Thus evidence based practice is the outcome of an appropriately trained physician in clinical epidemiology and this combination constitutes EBM.
Evidence is anything published in the literature and definitely there are hierarchies and these are by design because the designs, by definition, are hierarchical in terms of resistance to bias (and this is not the statistical bias that contributes to the MSE but rather biases that lead to non-causal associations).
The evidence in research synthesis is the highest level of evidence because the scientist that undertakes this has the capability (based on expertise in both research science and the content area) to help provide researchers with sufficient information to assess what contribution any new results can make to the totality of information, and thus permit reliable interpretation of the significance of new research and indeed if and what aspect of new research is needed on a topic. This alone is sufficient justification to move the synthesis to the top level of evidence irrespective of the additional benefits in terms of epidemiological and statistical mitigation of bias. Thus, the method itself is secondary when considering where to place evidence syntheses in an EBM hierarchy of evidence sources.
I have problems with agents asserting grandiose claims about terms like “evidence” or “reasoning” absent a sound logical foundation, and then becoming belligerent when I do not concede to their pretense of authority.
RE: physician competence regarding statistical analyses – think carefully about these comments by a mathematician and epidemiologist.
It is no shame to admit ignorance of many topics. I’m no expert in auto mechanics, carpentry, or archaeology, to name a few. But I don’t pretend to be so, and people do not risk blood and/or treasure on my non-existent recommendations.
Contrast this admission of ignorance to the behavior of surgeons in response to a clear error in statistical reasoning being pointed out:
I used to think Sander was exaggerating when he accused JAMA of manslaughter for how they report data, but after the past few years, that indictment might need to be expanded.
EBM was touted as a “revolution”, saving people from arbitrary medical authority.
The problem with revolutions is you end up back where you started. Instead of individual doctors making local decisions (that may or may not be rational), front line doctors are now governed by unaccountable medical bureaucrats, who make claims about “best practice” with no skin in the game, creating a failure point that increases the risk, compared to those “bad” old days.
No one is asking them to assume such a role nor are they implicitly in such a role because of their status as clinicians (regardless of what the clinician may believe about his/her research aptitude). However, there will be no sound clinical decision making if the clinician does not understand the basics of research methodology and that is why the EBM movement started - there are only two options:
a) Train clinicians to use the medical literature and thus train them to the level (at least) where epidemiological and biostatistical results are clearly understood
b) Hire a fortune-teller to sit in their clinics
Wow- you’ve got a pretty low opinion of physicians. Sounds like you’ve really got our number and don’t like what you see…
Dangerous overconfidence is not a problem that’s unique to medicine. It’s present in every field. It seems particularly prominent among those whose study has been self-directed. Such people might not have faced important consequences for errors in their thinking and might not have had their misunderstandings corrected through the formal guidance of a mentor with long-term, immersive applied experience.
On the contrary, I think the vast majority of physicians want to do well by their patients, but economic forces have manipulated them into a position to be unwitting tools of agents who do not necessarily have individual well-being in mind.
Good intentions and academic achievement do not necessarily correlate with independent thinking, or statistical skill, however.
Agree with many of your points. Arguably, the main roles filled by systematic review and meta-analysis in medical-decision making are:
To show the extent to which research in an area does or doesn’t tend to “point in the same direction;”
To take stock of questions that have already been addressed so as to prevent costly re-invention of the wheel;
To help identify knowledge gaps that can guide the design of future studies.
As is true for primary studies, high quality meta-analyses will have had input from people with both subject matter and methodologic expertise. Unfortunately, people aren’t always good at assessing their own expertise.
Problems arise when clinicians over-estimate their statistical/epidemiologic ability and try to conduct research without the help of those with methods expertise. Similarly, an RCT designed solely by a group of statisticians would most likely be useless to clinicians.
Arguably, the problem is significantly worse for observational studies. Since patients aren’t being exposed to an intervention, some observational researchers don’t seem to feel it necessary to solicit input/advice from clinicians when designing their studies, even though the questions they are trying to answer have clinical implications. Reading many syntheses/reviews of observational bodies of evidence, it often becomes painfully clear that nobody, over many years, bothered to take stock (or cared) whether any particular study result had any meaningful impact on patients. Lots of CVs were burnished as money was burned.
When it comes to RCTs, well-conducted evidence reviews often CAN influence practice. I’m thinking back to this really useful publication from 2016:
As an endocrinologist (I think?), you’ll recall that it was around this time that several large RCTs called into question the “glucocentric” view of type 2 diabetes management, with regard to effects on hard clinical outcomes. This approach to diabetes care had become so entrenched in clinical practice that it wasn’t until these types of evidence summaries started appearing that it felt like doctors finally stepped back and re-examined their approach to this disease. Of course, you’ll recall that it was also around this time that we finally started to see some of the newer diabetes medications (e.g., SGLT-2 inhibitors, GLP-1 agonists) demonstrating important benefits on hard outcomes for patients.
Ironically, you’ll also remember that our discovery of the cardiovascular benefits of these new diabetes medicines was “accidental” and stemmed from FDA’s need to respond to a highly controversial meta-analysis of the safety of an older diabetes medicine (rosiglitazone):
This “post-hoc” meta-analysis (i.e., it was designed AFTER conduct of its component trials) caused MUCH consternation in the clinical community. After rosiglitazone had been heavily promoted to doctors for many years, its cardiovascular safety was suddenly called into question by the “Nissen meta-analysis” (as it came to be known). Many advisory panels were convened to discuss next steps, ultimately culminating in FDA guidances on the conduct of trials for new diabetes medicines and meta-analyses for assessing safety endpoints:
In order to prevent recurrence of this type of scenario with future new diabetes drugs, FDA started to require that future diabetes trials be designed in such a way that drug-induced cardiovascular risk could be “capped” at a certain level. This new requirement that trials for new diabetes drugs should be designed to permit capture of a number of cardiovascular events that would be sufficient to “rule out” more than a certain degree of drug-induced CV risk was what ultimately ended up revealing the CV benefits of SGLT-2 inhibitors and GLP-1 agonists (benefits which might not have been identified if not for FDA’s new requirements).
And so, out of the ashes of one of the biggest debacles in the history of drug safety, emerged, arguably, two of the most important classes of drugs developed in the past several decades. And it all started with a meta-analysis…
So does that make me an “unwitting tool” who can’t think for herself? Friendly suggestion- you might want to tone down the doctor-bashing if you want people to continue to engage with you here…We’re all trying to learn from each other in order to help people. Repeatedly impugning the competence/critical thinking skills of an entire profession isn’t conducive to collegial interaction.
Yes, also an endocrine physician and there are a series of examples in this area e.g. low dose versus high dose radioactive iodine ablation post surgery for DTC. Several meta-analyses preceded the two NEJM trials (2012) and the issue is still not resolved many years later with more syntheses as well as primary studies appearing. What is very necessary now is a tool to determine the exit status of a meta-analysis that is robust and reliable so that once a meta-analysis is tagged “exit” all future trials, studies and syntheses can cease on that question as the cumulative evidence would be considered conclusive. We have been awarded a grant to work this out and hopefully there will be an answer if we can figure this out.
If you think pointing out logical and mathematical flaws in the journals and textbooks that you still accept is “doctor bashing”, that says nothing about me. You are still free to point out an error in my reasoning (as an honest scholar would).
The fact is, most clinicians are simply too busy doing patient care or admin duties to also become competent in data analysis. Nor can you make informed, independent decisions when critical data is not published.
When I did clinical care, I was. What is taught in med school or even CE classes is not enough.
What is enough? Just to become an “Associate” of the Society of Actuaries or Casualty Actuary Society (the agents who make sure risk is managed properly) requires most people (with quantitative aptitude) close to 4 years. Certainly, that is overkill for clinicians, but a calculus based math-stat course is the minimum.
This assumes instructors have an adequate understanding of statistics and mathematics. I think the past 100 years indicate they do not.
Bernardo, J. M. (2003). [Reflections on Fourteen Cryptic Issues concerning the Nature of Statistical Inference]: Discussion. International Statistical Review/Revue Internationale de Statistique, 71(2), 307-314.
Established on a solid mathematical basis, Bayesian decision theory provides a privileged platform from which to discuss statistical inference.
When I pointed this out in another thread, this was the reply:
"Bayesian decision making” it’s not very common in med research, as far as I can see. And it is also not very commonly meant in intro statistics books
Contrast this with Senn’s quote from a guest post on Deborah Mayo’s blog:
Before, however, explaining why I disagree with Rocca and Anjum on RCTs, I want to make clear that I agree with much of what they say. I loathe these pyramids of evidence, beloved by some members of the evidence-based movement, which have RCTs at the apex or possibly occupying a second place just underneath meta-analyses of RCTs. In fact, although I am a great fan of RCTs and (usually) of intention to treat analysis, I am convinced that RCTs alone are not enough.
I don’t like arguments from authority, but I’ve cited enough statistical experts that anyone who thinks I’m incorrect is honor-bound to give an explicit logical argument refuting my claim.
Using pre-data design criteria as an ordinal, qualitative measure of validity for an individual study has no basis in mathematics.
Goutis, C., & Casella, G. (1995). Frequentist Post-Data Inference. International Statistical Review / Revue Internationale de Statistique, 63(3), 325–344. https://doi.org/10.2307/1403483
The end result of an experiment is an inference, which is typically made after the data have been seen (a post-data inference). Classical frequency theory has evolved around pre-data inferences, those that can be made in the planning stages of an experiment, before data are collected. Such pre-data inferences are often not reasonable as post-data inferences, leaving a frequentist with no inference conditional on the observed data.
This is why James Berger (who literally wrote the book on statistical decision theory) went through great effort in working out conditional frequentist methods in the realm of testing. The paper is close to 30 years old, but we are still debating about proper interpretation of p values.
You cannot improve clinical research without an understanding of decision theory, which links design of experiments (and the value of information) with the broader context. The fact that EBM went off and developed heuristics completely ignorant of well established mathematical results always seemed suspicious.
EBM is about clinical decision making using research evidence and you are talking about “improving clinical research” - these are two completely different things. Clinicians are being taught how to use the literature (not really how to create the literature) in medical school and residency programs and that needs to be done robustly in medicine as in any other field. However, if you insist that car drivers cannot drive well until they know how to build a car, then you are simply mistaken. Yes, a clinician with research skills will be better able to target research where it is needed but that is not the main goal in medical schools and residency programs though the emphasis has been increased over the years. Finally, as @ESMD aptly put it “repeatedly impugning the competence/critical thinking skills of an entire profession” is not helful at all.