Design of Experiments in Economics vs Medicine: a Decision Theory POV

This post is inspired by the @f2harrell reference to the economist Lars P Syll’s post The Limited Value of Randomization. It is a good entry point into the criticisms of so-called “evidence based medicine” heuristics more generally. Syll is often linked to by @Sander_Greenland on Twitter for his skepticism of applied stats and mathematics in the realm of social science (especially econometrics).

First, it should be mentioned that different subject areas have different challenges, which need to be reflected in the methods of investigation. Having read scholarship in both economics and medical statistics, each side has reasonable grounds for their methodological preferences. I think randomization is much more useful for questions of medicine than economics.

In spite of that, the economists are much closer to the truth about the evidential value of randomization vs. “Evidence Based Medicine” proponents.

Some of the reasons why some economists find the emphasis on randomization objectionable have been discussed in a number of papers, such as:

Ravallion, M. (2009). Should the randomistas rule?. The Economists’ Voice, 6(2).
Ravallion, M. (2020). Should the randomistas (continue to) rule? (No. w27554). National Bureau of Economic Research. (link)

The main points from the papers:

  1. First, the case for such a preference is unclear on a priori grounds. For example, with a given budget, even a biased observational study can come closer to the truth than a costly RCT.
  2. Second, the ethical objections to RCTs have not been properly addressed by advocates.
  3. Third, there is a risk of distorting the evidence-base for informing policymaking, given that an insistence on RCTs generates selection bias in what gets evaluated.

The last one is particularly important, considering the increasing concern about outright fraud in medical research.

Towards the end of his 2020 article, Ravallion writes:

The popularity of RCTs has rested on a claimed hierarchy of methods, with RCTs at the top, as the “gold standard.” This hierarchy does not survive close scrutiny.

Of critical importance is the discussion in the economic and operations research literature on the design of experiments (broadly defined) and the role of randomization. The discussion among economists and applied mathematics scholars is much more nuanced (and rigorous) compared to the “evidence based medicine literature” which I take this blog post by Dr. Vinay Prasad as representative.

You need nothing to randomize.

Contrast this with the claim by physicist and Bayesian proponent E.T. Jaynes who devoted an entire section of Ch. 17 (Principles and Pathology of Orthodox Statistics) entitled: The Folly of Randomization. He wrote:

Whenever there is a randomized way of doing something, there is a nonrandomized way that yields better results for the same data, but requires more thinking. (p. 512 emphasis in the original).

What does logic and mathematics have to say on the issue? Another NBER paper discusses the issue from a decision theory perspective in the context economic policy.

Banerjee, A., Chassang, S., Montero, S., & Snowberg, E. (2017). A theory of experimenters (No. w23867). National Bureau of Economic Research.

The rise of the “Randomistas” in economics is especially interesting (and amusing) when you reflect on this quote from a 2016 version (PDF) the Banerjee et al.paper just mentioned above, where they discuss the methodological conflict between experimental economics and empirical microeconomics:

… there are good reasons why such a dialogue is difficult: an experiment designed according to the prescriptions of mainstream economic theory would get rejected by the most benevolent of referees; conversely, experimentation as it is practiced fails the standard axioms of subjective rationality. [ie. expected utility theory – my emphasis]

Contrary to Prasad, randomization is not free. In the context of controlled experiments, the benefits of randomization (depending on which method) only occur at higher sample sizes (ie. \ge 200), which makes randomization the least preferable allocation strategies, from the perspective of maximizing information within any context other than total ignorance.

John Lachin wrote (or co-authored) a number of valuable quantitative analyses of randomization procedures in 1988 that is consistent with the analysis in the economic literature:

Lachin, J. M. (1988). Statistical properties of randomization in clinical trials. Controlled clinical trials, 9(4), 289-311.

Lachin, J. M. (1988). Properties of simple randomization in clinical trials. Controlled clinical trials, 9(4), 312-326.

Matts, J. P., & Lachin, J. M. (1988). Properties of permuted-block randomization in clinical trials. Controlled clinical trials, 9(4), 327-344.

Wei, L. J., & Lachin, J. M. (1988). Properties of the urn randomization in clinical trials. Controlled clinical trials, 9(4), 345-364.

Lachin, J. M., Matts, J. P., & Wei, L. J. (1988). Randomization in clinical trials: conclusions and recommendations. Controlled clinical trials, 9(4), 365-374.

The following article by Kasy describes the well known and debated result from decision theory regarding randomized decision rules in the context of experiments.

To gain some intuition for our non-random result, note that in the absence of covariates the purpose of randomization is to pick treatment and control groups which are similar before they are exposed to treatment [ie. exchangeable – my emphasis]. Formally we would like to pick groups which have the same (sample) distribution of potential outcomes. Even with covariates observed prior to treatment assignment, it is not possible to make these groups identical in terms of potential outcomes. We can, however make them as similar as possible in terms of covariates. [ie. conditionally exchangeable via covariate adaptive allocation – my emphasis].

Implications for the so-called “hierarchy of evidence”

EBM teaches the use of pre-data design criteria to evaluate research after the data has been collected.

Both systems place randomized controlled trials (RCT) at the highest level and case series or expert opinions at the lowest level. The hierarchies rank studies according to the probability of bias. RCTs are given the highest level because they are designed to be unbiased and have less risk of systematic errors.

When viewed from the perspective of likelihood theory, what EBM calls “bias” is better thought of as misleading evidence.

The third evidential metric is propensity for observed evidence to be misleading. It is an essential compliment to the first evidential metric [strength of evidence – my emphasis]. Ideally, one would report the observed metric to describe the observed strength of the evidence as well as the chance that the observed results are mistaken. This third evidential metric is known as a false discovery rate; it is a property of the observed data.These probabilities often require the use of Bayes theorem in order to be computed, and that presents special problems. Once data are observed, it is the false discovery rates that are the relevant assessments of uncertainty.The original frequency properties of the study design - the error rates - are no longer relevant. Failure to distinguish between these evidential metric leads to circular reasoning and irresolvable confusion about the interpretation of results as statistical evidence.

In essence, waste and inefficiency is built into the very foundation of “Evidence Based Medicine” by:

  1. ignoring information that should be conditioned upon, leading to studies that should not be done, or
  2. conditioning on false information, causing surprise and controversy in practice, leading to more calls for additional research.

See also:

Greenland, S. (2005), Multiple-bias modelling for analysis of observational data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 168: 267-306. link

Although I limit the discussion to observational studies, the bias problems that I discuss often if not usually arise in clinical trials, especially when non-compliance or losses occur, and the methods described below can be brought to bear on those problems.

1 Like

Nice thoughts, even though on they whole they understate the value of randomization by at least a factor of 3. With randomization comes prospective data collection, better measurements, marking to treatment assignment, lack of bias, …

The opposite of this statement is true.

2 Likes

For example, with a given budget, even a biased observational study can come closer to the truth than a costly RCT.

You cannot assert this independent of context, particularly for some questions in development economics due to issues such as spillover effects.

When attempting to formally account for uncertainty due to active attempts to mislead, any priority given to randomization disappears. But that argument is for another post.

I’ll give you that there are big differences between clinical and econ/social experiments. My experience is strictly on the clinical side, for which the anti-randomization arguments are extremely weak.

3 Likes

I think there is room for broad agreement among decision theorists and randomistas with the use adaptive randomization procedures that:

  1. minimize sample size requirements.
  2. permit model-free inference via permutation tests or bootstrap based on design.

I also find Rosenbaum’s perspective on observational studies as approximations to RCTs a valuable way to look at things.

:new: The following 2014 paper discuses both asymptotic and small sample results of biased coin as well as deterministic allocation procedures, and makes connections to the theory of optimal experiments (both Bayesian and Frequentist). There have been other recent papers (no earlier than 2020) on adaptive allocation that extend these results.

It builds upon the series of Lachin’s 1988 papers by also including an interesting Bayesian allocation rule that is balanced at small sample sizes, and becomes increasingly close to random allocation at larger sample sizes, which is what I would expect from a decision theory perspective. It all comes down to the bias/variance trade one wishes to make.

Atkinson AC. (2014). Selecting a Biased-Coin Design, Statistical Science, Statist. Sci. 29(1), 144-163

Random allocation can be considered a clear dividing line between controlled research designs (being the least efficient from a strict expected information perspective) vs observational ones, with the latter needing larger effective sample sizes and rigorous data collection to control for factors undermining credibiliy – ie. confounding by indication.

An example that used covariate-adaptive allocation (minimization in this case):

Stinear, C. M., Petoe, M. A., Anwar, S., Barber, P. A., & Byblow, W. D. (2014). Bilateral priming accelerates recovery of upper limb function after stroke: a randomized controlled trial. Stroke, 45(1), 205-210. (link)

Intervention allocation was concealed and randomized using customized software (www.rando.la) that minimized between-group differences in age, baseline ARAT score, PREP stratification,2 and brain-derived neurotrophic factor genotype derived from a single baseline blood sample because this may influence plasticity and learning

Given the sample size (28 control, 29 treated) minimization on predictive covariates was correct, but there were a number of other flaws that detracts from analysis (ie. mean on stroke score impact scale). It is also strange to report p values in Table 1 when the study was explicit in using a minimization procedure.

William Briggs (a statistician quoted favorably in BBR regarding the limitations of p values, has written a blog post mostly compatible with the argument I am making in this thread.

Of all controlled allocation strategies, randomization is least preferable when there is information available on potential factors that need to be accounted for.

I think the quote below from Briggs is a bit too extreme in discounting the utility of randomization in a practical context, but extremism may be necessary to counter the dogmatism that elevates it to a “gold standard”. At best, randomization is a hedging strategy for ignorance, and a computational procedure when direct application of Bayes’ Theorem is too expensive to implement.

There is nothing in the world wrong, and everything right, with a controlled trial. But randomization is pure superstition, no different than cargo cult science, as I have explained in great detail in Uncertainty (scroll down here). And will explain here, too.

This doesn’t sound right. Randomization is a way to not have to assume that all elements of a model are correct, especially with regard to selection bias / confounding by indication. Randomization is how we rule out alternative explanations for responses.