Individual response

Thanks for your thoughtful response.

I think this is really the crux of the issue. Some might argue that trying to find structure in randomness is to deny the very existence of randomness…Opinions about whether or not it’s futile to even try to find patterns in chaos seem (crudely) to delineate two ideologies- causal inference epidemiology and statistics.

In fairness, I don’t think that views of the two camps are as black and white as this. Plainly, causal inference proponents don’t deny the existence of randomness and statisticians don’t deny the possible existence of occult cause/effect relationships in the world. Rather, the two groups seem to differ in their opinions about what type(s) of evidence will allow us to make the best “bet” when we are treating our patients, with a view to optimizing their outcomes.

Causal inference epidemiologists seem to feel that the world isn’t quite as random as many believe, and that if we can chisel out even a few more cause-effect relationships from the apparent chaos, then maybe we can make better treatment decisions (?) Maybe there’s some truth to this view, but if the number of clinical scenarios in which it might apply is small (e.g., conditions with strongly genetically-determined treatment responses), then the cost/effort involved in trying to identify such relationships could easily become prohibitive. In the long run, addressing social determinants of health would likely pay off much more handsomely with regard to improving the health of societies. Seeing governments/research funders throw huge sums of money toward what many consider to be a fundamentally doomed enterprise is aggravating, to say the least.

But getting back to the paper linked at the beginning of this thread: Maybe I’m misconstruing, but the authors seem to believe that it’s possible to infer treatment-related causality for individual subjects enrolled in an RCT, simply by virtue of the fact that they had been randomly assigned to the treatment they ended up receiving. For example, they seem to assume that anybody who died while enrolled in a trial must have died as a direct result of the treatment (or lack thereof) he received during the trial. In turn, it seems that this belief stems from a more deep-seated conviction that a patient’s probability of “responding” to a treatment is somehow “predestined” or engrained in his DNA, and therefore will be consistent from one exposure to the next. People who have posted in this thread are trying to point out that this conviction is incorrect. And if this is the underlying assumption on which the promise of “personalized medicine” hinges, then health systems are throwing away a whole lot of money in trying to advance the cause.

If you were to review case reports for all subjects who died (unfortunately) while they happened to be enrolled in a very large, longterm clinical trial, you wouldn’t be surprised to find some subjects who died from airplane crashes, slipping on banana peels, pianos falling from the sky, aggravated assault, accidental drug overdose, electrocution, and myriad medical conditions that were completely unrelated to the treatment they received (e.g., anaphylactic reaction to peanuts, bacterial meningitis outbreak, forgetting to take an important medication…). Events like this are recorded in both arms of clinical trials and have nothing to do with the treatment being tested. Presumably, though, the more deaths that are recorded, the more convinced we can be that between-arm differences in the proportion of patients who died might be due, at a group level, to the treatment in question, rather than simple bad luck.

Even if the tweet referenced above is now deleted, there’s plenty of circumstantial evidence that the author fundamentally believes that patients can be viewed like circuit boards- as though they are intrinsically “programmed” (like a computer) to respond the same way, whenever the “input” is the same.

I’m not a statistician nor an epidemiologist. So I don’t know how to phrase what I’m trying to say using math. But after practising medicine for 25 years, I’m not sure that it’s possible, for physicians to make a better “bet” (in most, but not all, cases) regarding the treatments we select for patients than one that is grounded in the results from well-designed RCTs. This approach seems completely rational to me. Conversely, I perceive innumerable ways that we could fool ourselves in the process of trying to identify engrained “individual responses” in a sea of potentially random events.

2 Likes

This needs clarification: There are at least two major types of probabilities used in stochastic inference:

  1. Aleatory probabilities that are connected to physical processes such as the random treatment allocation in randomized controlled trials (RCTs). This is randomness that is based on a well-defined physical process and its uncertainty can thus be validly quantified by standard statistical methodology.

  2. Epistemic probabilities that express our ignorance.

The two can be numerically equivalent and considered to express “randomness”. But they are fundamentally different as nicely described, e.g., here.

Because frequentism focuses on aleatory probabilities whereas Bayes allows both, these considerations can degenerate into the endless frequentist vs Bayes debate that would be counterproductive in this thread. Whether using a Bayesian or frequentist lens, a major task for physician scientists is to convert epistemic probabilities into aleatory ones as much as possible, chiefly through experimental design along with careful observations such as correlative analyses of patient samples. These need to be embedded in statistical models informed by causal considerations.

Almost every procedure used by “classical” statisticians imposes structure on randomness. Without such structure, there would be no way to do any kind of inference.

The difference between classical statisticians and causal inference epidemiologists, is not that epidemiologists assume “more” structure and “less” randomness. In practice, we use the same estimation procedures, and to the extent that we have different preferences about model choice, those differences do not reflect different allowances for “randomness”.

Rather, the difference is that we (that is, epidemiologists) insist on having a language for reasoning about whether the assumptions we make about the structure of randomness are consistent with our beliefs about how reality works, so that those assumptions can be evaluated as an integral (and essential) part of the overall scientific inferential procedure.

Without cause-effect relationships, statistics gives absolutely no basis for making rational decisions, you might as well read tea leaves. Cause and effect is there whether we believe in it or not. We can either tackle this heads on with a scientific language for determining whether and how we can learn about causal effects from the data, or we hide our heads in the ground and hope that we magically get the right causal answer from non-causal statistical inference.

There are some statisticians who deny the usefulness of the counterfactual language. In my view, they are invoking the magical category of “randomness” to sweep the issue under the carpet, unilaterally declaring that their preferred modelling approach is the canonical one-size-fits-all procedure for imposing structure on randomness, even when alternatives are just as consistent with living in a stochastic world.

Finally, I want to note that despite its exaggerated claims of importance, the paper that that is discussed in this thread is a very idiosyncratic approach that can only be used in some highly artificial settings, and even then, with highly questionable utility. It is most certainly not an accurate summary of current thinking in personalized medicine, and it does not reflect consensus among the causal inference crowd.

1 Like

This depends on what you mean. When applied to a causal design (e.g., a randomized experiment where there is no post-randomization trickery) causal language is hardly needed at all.

The real problem with causal epidemiology is when the rubber hits the road. Lots of methodologists talk about notation and theory but can’t give us a real complete case study based on real data – a case study in which the DAG is justified by the subject matter and all needed measurements are available in the data. A case study where the rest of us can learn how to do real and not theoretical causal inference. See the call for examples here.

2 Likes

Disagree. Causal language is absolutely pertinent to both my lab experiments and clinical trials. Just finalized and sent to my co-authors a draft manuscript showcasing how causal inference can inform the interpretation of RCTs in ways that you would very much agree with. In fact, I am using your RMS package among other tools to provide practical examples.

Finally we have found something we can mostly agree upon (even if others from “my side of the aisle” might take issue). In my view, the vast majority of the utility of randomized trials comes from the ITT analysis; and while the ITT analysis can certainly be understood from the perspective of causal inference, the required “causal” methodology is so trivial that there is no clear benefit to formalizing it.

The real problem with causal epidemiology is when the rubber hits the road. Lots of methodologists talk about notation and theory but can’t give us a real complete case study based on real data – a case study in which the DAG is justified by the subject matter and all needed measurements are available in the data. A case study where the rest of us can learn how to do real and not theoretical causal inference. See the call for examples here.

I would even mostly agree on this. It is indeed rare that DAGs are justified by subject matter knowledge, and I have very little confidence in most applications of observational causal inference. However, that is in no way an argument in favour of using classical statistics applied to observational data. Such analysis will have all the same problems, and just lack a framework for clarifying why its conclusions are likely biased.

As I have previously stated on Twitter, the vast majority of the benefit of the causal inference framework is going to arise from the incorrect causal conclusions that it helps us avoid, rather than the correct causal inferences that it assists us in making. Causal inference makes it possible to evaluate the plausibility of the assumptions that are required for the study to provide unbiased estimates of something that matters for decision making. In practice, a sincere analyst will almost always conclude that those assumptions are not plausible. In most settings, decision makers would be right to insist on randomized trials. The “Evidence Based Medicine” movement was fundamentally correct in their assessment of observational evidence (whether analyzed with traditional or causal methods).

I do however believe there are some settings where causal inference is worthwhile. In my view, the best “case studies” for showcasing causal inference from observational data , will almost always be post-marketing studies on the adverse effects of medications. These are high-stakes decisions where we need to rely on the best available evidence, even if that evidence is flawed. Adverse effects tend to be very rare (meaning that RCTs are usually underpowered to detect them). Moreover, unintentional effects are much less subject to confounding by indication, meaning that it is much more plausible that we will be able to control approximately for confounding.

It is true that in most cases when a drug is convincingly found to have an adverse affect, the safety signal will be so strong that there is little risk of getting a different result if we rely on non-causal statistics. But if we are going to rely on observational data, I don’t think it hurts to do it correctly…

3 Likes

Love disagreements! :slight_smile: The ITT analysis allows physically justifiable measurement of uncertainty. However, for clinical practice, as opposed to health policy, it is the PP analysis (or “as-treated” more often used for medical devices) that is actually more pertinent. And much harder to debias without the use of causal tools. Nice overview here, as I am sure you are aware.

Excellent note. I especially like this:

1 Like

I think you are overrating the value of causal language for clinical trials that are in the ITT mode. I am very will to say succintly in such cases that from our data generating model E(Y|X, tx=B) - E(Y|X, tx=A) is our causal estimand.

Now something to really disagree on! As-treated/PP is not very useful to the physician making (with the patient) a treatment decision at time zero. It doesn’t yield time-forward prospective estimates.

1 Like

Nope. See this thread why even in full ITT mode trialists will make tons of mistakes when not thinking about the processes generating the RCT data. This is where the whole mistaken notion is based upon whereby simple correlations between overall survival (OS) with an intermediate endpoint magically assume the OS estimate as gold standard despite using a bogus OS estimand. The field of dynamic treatment regimes in statistics evolved to protect us against exactly these mistakes.

This mistakes valid statistical inference (ITT) with the estimand we clinicians truly want. Take for example this RCT that recently created commotion on twitter. In the ITT analysis the “colonoscopy” group is patients who got allocated to receive an invitation to undergo colonoscopy. When I discuss in clinic with my patients, we are interested on what happens if they actually get the colonoscopy. Not what happens if they receive an invitation. That’s because they won’t receive such an invitation by a trial group. We will make decisions together on whether or not to actually do the colonoscopy. And for that we need to estimate the potential outcomes of actual colonoscopy versus no colonoscopy. This is much harder to estimate than the ITT. But it is nevertheless what we actually want.

1 Like

I can see this more for the colonoscopy example than for medications. The point about meds is that it’s not “do this now or don’t do it at all” but rather degrees of adherence, and the adherence over time is unpredictable. The ITT estimate averages over adherence observed in the trial, assumes adherence in the field is fairly similar, and that is our current best guess of what benefit the patient will receive.

If you really want a hypothetical “if you adhere to the treatment fully” estimate I’m sure we’ll both agree that decent estimates of that come only from RCTs where under one assumption you have a perfect instrument for an instrumental variable analysis to estimate efficacy under perfect adherence. Causal inference comes into that.

I’ll read the links you provided before commenting on the other part. Thanks for a great dialog.

1 Like

Exactly! I wrote something very similar on the aforementioned draft regarding approaching such problems as an IV analysis using RCT data. Will post a link to the article once it is out.

Both points I made above are two sides of the same coin. It gets even better: the advantage of Bayes is that it gives us the flexibility to intuitively model these challenges as hybrid problems of both ignorance and randomness (or other physical design processes such as blocking etc). However, we have to be constantly mindful of connecting our models with putative causal mechanisms when we do that. This was insisted upon primarily by traditional frequentists. It is good advice.

1 Like

I’ve read the post which is really fantastic. So I conclude that what I was advocating applies to disease-free survival time but not to overall survival. For the latter, the ITT treatment effect estimates a policy estimand, e.g., compares those randomized to treatment B with all the subsequent treatment modifications that happened to them with those randomized to treatment A with all the subsequent developments happening to them. I think this is still a causal estimand, it’s just a policy estimand rather than a “we control what happens” estimand.

A side question for you: If we do a state transition model and use it to estimate state occupancy probabilites such as P(pt has disease returned by 6m and is alive) will that provide anything useful to the discussion? This is a simple unconditional probability (except for conditioning on treatment and baseline covariates). One can also get overall survival probabilities from this model, but they may have the same problem you wrote about (except that the model will fit better because you can allow different covariates for death vs. for the disease recurrence state).

2 Likes

Exactly! And this key point becomes clear when we draw the causal graphs. Otherwise it is very hard to see what the modeling challenges are here.

When thinking of overall survival, we are looking for decision rules that will lead to optimal long-term benefits. Such decision rules need more work to be estimated than standard RCT models will provide for chronic diseases, such as many cancers nowadays. Good problem to have, it’s a consequence of our patients living longer and better. But a challenge nonetheless for health systems, regulators, patients, clinicians, and methodologists.

I would expect yes. It is a more complete model. But having gone through the laborious exercise of estimating decision rules from RCTs that randomly allocated interventions sequentially (ideal scenario that almost never happens in oncology) what I learned is the importance of quality data. No amount of elaborate modeling can salvage an RCT that did not collect the right data. And the best way to see in the design phase ahead of time what we need is to draw the causal diagrams. Intuitively, people can sense that we’ll need information on subsequent therapies. But what the graphs reveal is that we’ll also need information on covariates at transition times. This way we can make sure to collect them.

One of my favorite models for such purposes is described in this JASA paper that generated a lot of discussion at the time within the statistical methodology community. Figure 1 that sets the challenge is a causal diagram. It is essential for the model. Causality lies at the foundations of all statistical modeling. In fact, some of the current ongoing twitter discussions that prompted this thread repeat heated arguments between Fisher and Neyman, just using different terminology. They both undoubtedly thought causally despite lacking today’s more rich notation and approaches.

2 Likes

Hi Pavlos

As noted previously, I think that the type of work you’re doing is really very unique. I hope that this uniqueness is being fully appreciated/recognized by your colleagues (I strongly suspect that it is).

You seem to belong to a very small/rarefied group of researchers in the world. Your specialization in oncology presents a multitude of important longstanding/unresolved challenges [e.g., how to identify therapies that, when used serially with other therapies, can be expected to improve a patient’s overall (not just progression-free) survival]. In turn, these challenges have, effectively, forced “outside the box” thinking, ultimately causing you to ask if there might be a role for causal modelling in optimizing RCT design. Then you went a huge step further, learning the language of causal inference epidemiology to see how this might work.

Now, after learning the language of causal modelling, you have identified an important potential niche for it in optimizing the design of oncology clinical trials. To me, this work feels like the perfect clinical application of DAGs, and the one with the most potential to impact patient care.

I suspect that a key problem, to date, has been that researchers trained in more modern causal inference methods have perhaps lacked the clinical background and/or clinical incentive to search for alternate applications of these methods that would be accepted by the clinical community.

Historically, clinicians/statisticians have balked at the seemingly never-ending promotion of DAGs as a way to derive causal inferences from observational data alone. Clinicians have pushed back, asking: 1) given that these methods have been around for years, why are we only seeing them used in a very small fraction of published observational research?; and 2) why would we ever believe that these methods can generate results that are reliable enough to influence patient care decisions (except, perhaps in the case of strong, consistent safety signals derived from well-conducted studies)? At the end of the day, it’s unlikely that we will ever consider non-randomized evidence to be on par with observational evidence for the purpose of assessing treatment efficacy.

What you have done, effectively, is to carve out a niche for these methods as a way to optimize RCT design (the study design that clinicians consider to be optimal, in most cases, for guiding clinical decision-making), de-emphasizing the historical push for their application as “stand-alone” methods to make causal inferences from observational data.

This is all very exciting…

2 Likes

You are very kind, thank you for summarizing these efforts better than I ever could!

A lot of this is just us standing on the shoulders of giants, many of whom are regulars on this forum. For example, @Stephen taught us that if something is not helpful in RCTs then it is even less likely to be useful in analogous observational studies, whereas the converse is not necessarily true. And it is also his insistence on debating the Lord’s paradox (latest post here with discussion and links to previous entries) that is nicely highlighting limitations / challenges for graph-based causal inference schools.

It is indeed true than when a framework is shown to work empirically, it gains acceptance. We are catching flaws in our studies earlier and design them more efficiently to learn from mistakes and gradually improve patient care. And because this makes more people get involved, they then bring their own unique perspective into the mix creating a nicely dynamic ecosystem.

2 Likes

Philosophers have written enough papers to fill a library on why this position is logically untenable, and no amount of appeals by physicians to the unique aspects of patient care will overcome this.

My major gripe is 1. the EBM position is logically false, and the vast majority of uncontrovertial treatments would be ruled out by EBM. 2. Other fields uncritically adopt EBM rhetoric and inference rules, leading to nonsense in the peer reviewed literature.

One of the real world cases that drew attention of philosophers of science and ethics involved study of ECMO for newborns in respiratory distress. Richard Royall, a Johns Hopkins professor of biostatistics had this to say on this dogma of EBM from both an ethical and statistical POV. A link to the paper is in the thread.

Blockquote
We urge that the view that randomized clinical trials are the only scientifically valid means of resolving controversies about therapies is mistaken, and we suggest that a faulty statistical principle is partly to blame for this misconception.

To be blunt about this demand for randomization in all treatment contexts, how many infants would need to be randomized to ineffective interventions (and in essence, condemned to die with high probability) in the ECMO case?

Paul Rosenbaum, another statistician, has written a number of papers showing how a carefully done observational study can, by demonstrating insensitivity to unknown confounders, approximates a randomized experiment.

Blockquote
Randomized experiments and observational studies both attempt to estimate the effects produced by a treatment, but in observational studies, subjects are not randomly assigned to treatment or control. A theory of observational studies would closely resemble the theory for randomized experiments in all but one critical respect: In observational studies, the distribution of treatment assignments is not known…Using these tools, it is shown that certain permutation tests are unbiased as tests of the null hypothesis that the distribution of treatment assignments resembles a randomization distribution against the alternative hypothesis that subjects with higher responses are more likely to receive the treatment. In particular, these tests are unbiased against alternatives formulated in terms of a model previously used in connection with sensitivity analyses.

Rosenbaum elaborates on the randomization fallacy in this 2015 paper:

Rosenbaum, P. R. (2015). How to see more in observational studies: Some new quasi-experimental devices. Annual Review of Statistics and Its Application, 2, 21-48. (PDF)

Blockquote
The statistical literature may be misread to say that only the elimination of ambiguity, not its reduction, is acceptable. Such a misreading might result in skepticism about quasi-experimental devices that reduce, but do not eliminate, ambiguity.

He goes on to discuss the technical issues of identification (of model parameters), and that observational studies can still provide information when they might not provide identification.

Regarding Basu’s definition of identification:

Blockquote
The entry defines identifiable to mean in different states of the world … yield probability distributions for observable data that are themselves different. One could misread this statement as saying we learn nothing about \theta unless there is identification, nothing unless there is a consistent test for each level of \theta. More careful than most, Basu is aware we often learn in nonidentified situations.

From a decision theoretic perspective, this reduction in ambiguity might be enough evidence to promote an intervention or change policy. But that is context sensitive and cannot be decided a priori.

Further Reading

@R_cubed I think you’ve overstated things. A huge problem with medical practice is that randomized trials are not done at the right time. As Thomas “randomize early and often” Chalmers frequently argued, randomization should be undertaken before medical opinions are entrenched. The fact that clinicians (like most other experts) feel that they know what’s best, without data, leads to a perception of non-equipois that prevents clinical trials from being done later. History is full of medical reversals once randomized trials are actually done. Better decisions are made when we admit what we don’t know and try to find out what we need to know. We many many fold more clinical trials that we currently do, and we need to find out how to make trials easier to do. Bayes can help a lot with that, in addition to finally figuring out that we don’t need to collect as much data as we currently do.

2 Likes

If you are going to claim that my position is “logically false”, then please show me that it relies on a logically invalid syllogism. Alternatively, please stop appropriating terms that have precise definitions in order to borrow authority for your position.

In my view, the overall discussion about EBM conflates two separate questions:
(1) To what extent should we rely on statistical evidence vs mechanical/biological understanding when we predict the consequences of clinical decisions?
(2) If we rely on statistical evidence, to what extent should data from randomized trials be preferred over observational data?

The first question is interesting, and I do not claim to have a conclusive answer. It is my intuition that human biology is so complex that if we tried to make decisions based on biological understanding, it would almost always lead to unpredictable consequences. Since most of these interventions are going to be used in a large number of people with similar clinical presentations, I think it usually makes sense to empirically observe how the intervention works in practice, but I am not going to claim that this will universally be the case. Perhaps we can have a future of “Star Trek medicine” where advanced technology can scan the body and use its complete understanding of mechanism to accurately predict the consequences of intervention. And perhaps there are interventions that exist already, that are known to interact primarily with some completely understood subsystem of the human body, such that it is obvious that the benefits are real and large, and cannot possibly be outweighed by unpredicted adverse effects. I can believe that ECMO and parachutes fall into this category.

The second question is really not even worth discussing. Any rational decision maker should give much more credence to the validity of their beliefs, if those beliefs are informed by a large and properly conducted randomized trial, compared to if their are informed by observational data. This completely dwarfs almost all other considerations, such as sample size. For most medications, without randomization it is simply not possible to get any real confidence that we know its effect, no matter how good the data is in terms of every other relevant consideration.

When I said that “EBM was correct” I said specifically that they were right about randomization vs observable evidence. I do not support the stupid form of evidence based medicine that insists on having evidence from randomized trials for everything. Even without evidence, you will always have beliefs, and you should act to optimize the consequences of your actions given those beliefs.

The first essential point is that if you want to calibrate your beliefs to reality by analysing data, you can either do that correctly using randomization, or you do so in a noisy and biased way with observational data. If you choose the second approach, your beliefs are not going to improve much.

The second essential point is that the human body is incredibly complex and responds unpredictably to those interventions that were not present in the ancestral environment. Adverse effects can often be catastrophic. It is fundamentally rational to insist that we test the medication in a small group of people, and rigorously observe its effect, before concluding that we “know” the consequences of using that drug. But that doesn’t mean you let that get in the way of acting on your beliefs if you have enough certainty that the benefits exceed the costs, based on a mechanistic understanding of human biology. And it most certainly does not mean you cannot act even when you know that the consequences of not acting are dire, such as with ECMO or parachutes.

1 Like

Blockquote
If you are going to claim that my position is “logically false”, then please show me that it relies on a logically invalid syllogism. Alternatively, please stop appropriating terms that have precise definitions in order to borrow authority for your position.

While I think the entire history of human beings learning things long before randomization was discovered is proof enough, Royall already rebutted this demand for randomization in the paper I cited in his discussion of the Likelihood Principle:

Blockquote
Statistical theory explains why the randomization principle is unacceptable. It does this in terms of the concepts of conditionality (ancillarity) and likelihood… The conditionality principle asserts that when there is an ancillary statistic C present, inferences should be based upon the observed value of C. The problem this creates for the randomization principle is that the statistic representing the result of the randomization is ancillary; thus the conditional randomization distribution is degenerate, assigning one to the actual allocation used … the only “inference” on the observed data is “I saw what I saw”.

My in thread post of a quote from Dennis Lindley RE: randomization being useful but not necessary is also relevant.

Design considerations are very important before the data is collected (for the experimenter), but the Likelihood Principle does not use that to create “hierarchies of evidence” after the data is collected. There should be no a priori reason for a reader to grant randomized studies greater weight than observational ones simply by a pre-data design criterion.

Goutis, C., & Casella, G. (1995). Frequentist Post-Data Inference. International Statistical Review / Revue Internationale de Statistique, 63(3), 325–344. Frequentist Post-Data Inference on JSTOR

Blockquote
The end result of an experiment is an inference, which is typically made after the data have been seen (a post-data inference). Classical frequency theory has evolved around pre-data inferences, those that can be made in the planning stages of an experiment, before data are collected. Such pre-data inferences are often not reasonable as post-data inferences, leaving a frequentist with no inference conditional on the observed data.

None of this is important if powerful interests can manipulate what studies are published, and which ones are not, via subversion of the peer review process, which Ioannidis discusses in those papers.

Much to the disappointment of those who want easy answers, there are no rules based upon design features alone, that can decide this a priori, especially when there is an appreciable probability of fraud or deception. My position is that needs to be done on a case by case basis, according to formal decision theoretic principles. Michael Rawlins presents many historical examples of medical learning from what we know are imperfect data, that many EBM proponents would find problematic.

The world we have to deal with are:

  1. large economically and politically powerful, and coordinated actors who can afford to produce randomized designs (or the illusion of them) vs
  2. disorganized and politically weak agents who might be able to rebut biased presentations of RCTs with observational evidence or approximate the relevant RCT when powerful interests have no incentive to conduct a credible RCT.

That is the problem I am worried about, and am working out how to rigorously formalize the notion.

Related Reading

Rubin, D. B. (1992). Meta-Analysis: Literature Synthesis or Effect-Size Surface Estimation? Journal of Educational Statistics, 17(4), 363–374. https://doi.org/10.3102/10769986017004363

Blockquote
In contrast to these average effect sizes of literature synthesis, I believe that the proper estimand is an effect-size surface, which is a function only of scientifically relevant factors, and which can only be estimated by extrapolating a response surface of observed effect sizes to a region of ideal studies. This effect-size surface perspective is presented and contrasted with the literature synthesis perspective.

In a Robust Bayesian Meta-Analytic approach, design considerations could be treated as a nuisance factor, and integrated out.

There are some errors in statistical reasoning here, but the main points are important.

Blockquote
Seemingly well designed, executed, and reported, RCTs with exciting results can also be misleading due to the hijacked research agenda. These trials are designed to deceive and the methods of deception are alarmingly simple, but effective. The main tactics used relate to the choice of comparators, the choice of outcomes, and the manipulation of statistics to produce desired outcomes, and selectively report them.

Nothing in EBM textbooks or literature prepares a scholar for the adversarial context of the “real world.”