Confusion flows downhill- a plea to methods teachers for unified messaging

Proposal: Gather the most prominent names in stats and epi in a room together and lock the door from the outside (with meals and bathroom breaks of course), until they can formulate unified messaging on certain foundational concepts.

Recently, on Bluesky:

https://bsky.app/profile/miguelhernan.bsky.social/post/3lbu5vz4ke322

One response:

https://bsky.app/profile/jdwilko.bsky.social/post/3lbukgotmqc2n

Another response:

https://bsky.app/profile/stephensenn.bsky.social/post/3lbuaksqnac2w

It’s hard to believe, as we come to the end of 2024, that these kinds of crazy-making exchanges still occur “at the top.” Let’s ponder how they affect students, researchers, and the medical literature


5 Likes

Yes I recently saw a talk mentioning that even if there is exchangeability of the exposure and non exposure groups in an RCT there may not be for a mediator so you need to adjust for variables that impact both the mediator and outcome. I found that confusing.

1 Like

Yes it’s an overkill. The Senn approach is simple and easy to execute and can be fully pre-specified. Account for outcome heterogeneity no matter what the randomization procedure. Don’t look at balance.

2 Likes

More (bolding is mine):

https://ajph.aphapublications.org/doi/10.2105/AJPH.2018.304337

“All we can estimate from randomized trials data are associations; we just feel more confident giving a causal interpretation to the association between treatment assignment and outcome because of the expected lack of confounding that physical randomization entails. However, the association measures from randomized trials cannot be given a free pass. Although randomization eliminates systematic confounding, even a perfect randomized trial only provides probabilistic bounds on “random confounding” —as reflected in the confidence interval of the association measure—and many randomized trials are far from perfect.”

And

https://www.linkedin.com/posts/miguel-hernan_randomization-observational-confounding-activity-7267160248774246402-izL9/

(See the comments
)

And (?possibly highlighting disagreements around interpretation of original source material)

https://www.stat.ubc.ca/~rollin/papers/GreenlandStatSci1999

“Some controversy has existed about adjustment for random covariate imbalances in randomized trials. Although Fisher asserted that randomized comparisons were “unbiased,” he also pointed out that they could be confounded in the sense used here (e.g., see Fisher, 1935, page 49). Fisher’s use of the word “unbiased” was unconditional on allocation, and therefore of little guidance for analysis of a given trial. Some arguments for conditioning on allocation are given in Greenland and Robins (1986) and Robins and Morgenstern (1987). Other arguments for adjustment in randomized trials have been given by Rothman (1977); Miettinen and Cook (1981) and Senn (1989).”

Versus

https://bsky.app/profile/maartenvsmeden.bsky.social/post/3ldxqci4e6s2y

6) P-values in Table 1 of randomised trials

Random fluctuations in the balance between groups in baseline characteristics of the trial are not a threat to the validity of the trial results. There is no such things as "random confounding"

AND

https://doi.org/10.1002/sim.4780131703

Physicians and other scientific literature consumers: :face_with_spiral_eyes: :sob:

Do conflicting interpretations of original statistical sources lie at the heart of certain controversies that plague research criticism today? Are these disagreements irreconcilable because they are philosophical rather than technical in nature? If so, then wouldn’t it be more productive for experts in epidemiology and statistics to “teach the controversy,” rather than stand, like Zax, in the Prairie of Prax for another 59 years, with each side accusing the other, repeatedly and publicly, of being horribly mistaken? If teachers can’t find enough common ground to present a unified message, then everyone else loses


2 Likes

To clarify though, Hernan and Senn are literally saying the same thing and I doubt either of them thinks otherwise, although Senn did characteristically try to stir the pot by mentioning propensity scores in a subsequent reply :wink:

The biggest problem is that the twitter-type of websites have many advantages in facilitating direct interactions among diverse groups (or at least they used to) but they also encourage careless reading and replies along with trolling. This can explain some of the subsequent replies by others. I literally was getting replies on X yesterday preaching to me about the value of longitudinal ordinal models
 :stuck_out_tongue:

The other issue is indeed that different academic fields use different terminologies and often the same word may mean completely different things. We tried to unpack the topic discussed in the Bluesky thread in Section 22 “Systematic and Random Biases” here and in fact the whole purpose of that paper is to facilitate cross-disciplinary collaborations.

Unfortunately, in many cases different disciplines will not adapt their terminology even when they are aware of the confusion early on in the development of a term. A recent example where both data science disciplines are quite aware of each other (but have actively decided it is not worth it to change their terminology for now) is E-values:

-In the epidemiology/causal inference literature E-values have been introduced as a metric to quantify the robustness of a causal association to unmeasured confounding.

-Conversely, in theoretical statistics (and particularly the subdiscipline interested in belief functions and imprecise probabilities and the more recent field of conformal prediction that has generated traction in computer science and machine learning), E-values represent multiplicative measures of evidence against a tested hypothesis.

Interestingly, applied biostatisticians will likely be exposed more to the E-values from the epidemiology/causal inference literature than the E-values from theoretical statistics.

3 Likes

On Mastodon, tho, it’s mostly pictures of birds. However, like this uncontacted tribe, I feel I am thriving!

4 Likes

OK, funniest reply of the year definitely goes to Datamethods :joy:

3 Likes

But are they really saying the same thing?.. I’m not at all sure about that
If two people are saying the same thing, then they usually say “yes, I agree with you, well said
” But this is not what I hear in these interactions (and there have been many such examples over the past few years). What I hear is more like: “you’ve got it all backwards”- over, and over,
and over again


Your paper (Interpreting RCTs) was just great and provided a lot of clarity. Everyone should read it. But why can’t people at the top of these fields seem to agree on how to explain certain concepts clearly, accurately, and consistently (?)

If these interactions are just about semantics, then these inconsistencies need to be addressed- yesterday- and a common language needs to be developed. But if they are instead about differing interpretations of fundamental concepts, then the differences need to be explicated and taught.

I think the crux of the problem is that the audience for these discussions usually can’t tell whether the arguments are about semantics or the actual concepts themselves. The end result is that people who are blameless in this Cold War (e.g., physicians trying to interpret the medical literature) become proxies for the warring factions. Regardless of how they design or interpret a study, they always end up being criticized by the other side


3 Likes

Yup, they are definitely in agreement. In fact, Hernan links afterwards to Senn’s classic article on the topic noting it is an excellent introduction to this topic - but then (in the reply copied on your first post) he gets scolded by Jack Wilkinson for not being aware of that paper’s main theses :stuck_out_tongue: : https://bsky.app/profile/miguelhernan.bsky.social/post/3lbu6vontlk25

These apps do not lend themselves to consensus building :slight_smile:

1 Like

A test: Ask each side:

  1. "What is the “purpose” of randomization?
  2. What does the term “random confounding” mean?
1 Like

As another test seeking the viewpoint of a nonhuman intelligence, I asked GPT4o to help unpack the Bluesky conversation with the following prompt:

Miguel Hernan tweeted the following: Does randomization ensures balance of risk factors between groups? Consider this: In Denmark 860 individuals were randomly allocated to either intervention or control. Individuals were unaware of their allocation. No intervention took place. Mortality was higher in the intervention group (p=0.003) Vass M (PhD Thesis). Prevention of functional decline in older people. Faculty of Health Sciences, U of Copenhagen 2010, p.120. (Thanks to Mikkel Zöllner Ankarfeldt for bringing this example to my attention.) What happened? By chance, some risk factors were more common in the intervention group. Unsurprising. By definition, the 95% confidence interval of 5% of (perfect) trials isn’t expected to include the true value of the effect. Again: Of 20 randomized trials in which treatment truly has a null effect, the 95% CI of one of them isn’t expected to include the null value. Just by chance. When risk factors are imbalanced for non-chance reasons in observational studies, we call it confounding. An interesting point is that, regardless of whether the imbalance results from chance or confounding, we are better off ADJUSTING for prognostic factors that are imbalanced between groups. In Chapter 10 of “Causal Inference: What If”, we describe arguments for adjustment in randomized trials and refute some fallacies used to advise against adjustment. https://hsph.harvard.edu/profile/miguel-hernan/ A practical challenge is how to incorporate adjustment into the design of #randomizedtrials. Frank Harrell then replied: Good points. We need Stephen Senn to cross over from X as he’s spent a lifetime studying this. So important to get the design as optimum as practical by collecting and adjusting for big prognostic factors. #StatsSky #EpiSky To which Miguel Hernan replied: Agree. Stephen Senn’s “Seven myths of randomisation in clinical trials” pubmed.ncbi.nlm.nih.gov/23255195/ is a good place to start. And the work by Jamie Robins and colleagues helped us understand “the curse of dimensionality” in high-dimensional settings (references in Chapter 10 of “What If”). Stephen Senn then replied: We should all stop thinking about this in term of point estimates. Once we start thinking about it in terms of inferences (uncertainty distributions) it’s obvious that you should adjust with that which is prognostic, whether or not it’s imbalanced. It also explains why we should not analysed a randomised block design like a completely randomised one, a mystery that propensity score enthusiasts have to resort to arm waiving to address. Randomisation allows you to use the distribution in probability of the effect of factors you have not observed. This is irrelevant for observed factors. Does Stephen Senn agree or disagree with Miguel Hernan? Explain why?

GPT4o answer:
Stephen Senn and Miguel Hernán fundamentally agree on the importance of adjusting for prognostic factors in randomized trials, but they emphasize slightly different aspects of the rationale and methodology. Here’s the breakdown:

  1. Agreement on Adjustment for Prognostic Factors: Both Senn and Hernán argue that adjusting for prognostic factors—whether or not they appear imbalanced by chance—is crucial in randomized trials. Hernán highlights this need from a practical perspective, especially in high-dimensional settings, as a way to reduce variability and improve efficiency in estimating causal effects. Senn agrees and stresses that adjustment should be done for factors that are prognostic, irrespective of observed imbalance.
  2. Uncertainty and Inference: Senn takes the discussion further by emphasizing the importance of framing the problem in terms of uncertainty distributions (inferences) rather than point estimates. He critiques analyses that ignore the probabilistic nature of the distributions and advocates for a more nuanced statistical framework, which aligns with Hernán’s call for thoughtful adjustments.
  3. Critique of Simplistic Approaches: Senn critiques methods like analyzing randomized block designs as if they were completely randomized trials and suggests this stems from misunderstandings about randomization’s role. This aligns with Hernán’s earlier caution against fallacies that discourage adjustment in randomized trials.

In conclusion, Stephen Senn and Miguel Hernán largely agree on the principles and benefits of adjustment in randomized trials. Senn’s emphasis on uncertainty distributions and critiques of simplistic methods complement Hernán’s advocacy for principled adjustment and thoughtful trial design.

Another commenter (JW) also replies to Hernan: With respect, this suggests quite a serious misunderstanding about why we randomise. We do this to ensure imbalances are random. Our analyses treat them as such, and ensure valid inference. The problem here is with the idea that point estimates are useful, rather than with randomisation. Is JW’s reply warranted? Explain why?

GPT4o answer:
JW’s reply is partially warranted but oversimplifies the discussion and misunderstands the nuances raised by Miguel Hernán and Stephen Senn.

Points of Agreement:

  1. Purpose of Randomization: JW is correct that the primary purpose of randomization is to ensure that any imbalances between groups are due to chance rather than systematic bias. This allows for valid inference under the assumption that observed and unobserved confounders are distributed randomly between groups.
  2. Random Imbalances: JW accurately points out that random imbalances are expected in randomized trials and that valid inference can still be achieved as long as the analysis accounts for the randomization process. This aligns with standard statistical reasoning.

Points of Disagreement or Oversight:

  1. Utility of Adjustment: Hernán’s point isn’t that randomization fails to ensure valid inference, but rather that adjusting for imbalances in prognostic factors—whether random or systematic—can improve the precision of the estimates. Adjustment reduces variance and enhances statistical power, especially when dealing with high-dimensional data or strong prognostic factors.
  2. “Point Estimates” Mischaracterization: JW critiques the reliance on point estimates but misses Hernán’s broader argument. Hernán and Senn are not advocating for an uncritical reliance on point estimates but are instead emphasizing how adjustment for prognostic factors helps refine those estimates and their associated uncertainty. Senn specifically addresses this by advocating for a focus on uncertainty distributions, which JW does not engage with.
  3. Misunderstanding of Adjustment: JW seems to imply that adjusting for random imbalances is unnecessary because randomization “ensures valid inference.” While this is true in a narrow sense, adjustment for prognostic factors is not about invalidating randomization; it is about improving the efficiency of the analysis by incorporating additional information.

Conclusion:

JW’s reply captures the fundamental purpose of randomization but fails to engage with the core argument made by Hernán (and supported by Senn): that adjustment for prognostic factors, even in randomized trials, can improve precision and reduce variability in effect estimates. Thus, while JW’s critique of point estimates in isolation is valid, their dismissal of Hernán’s perspective reflects an incomplete understanding of the practical benefits of adjustment in trial analysis.

3 Likes

Hah! Here’s my “non-human rebuttal” :slight_smile:

Question to GPT4o:

“Does Stephen Senn agree with Miguel Hernan about adjusting for prognostic factors that appear to be unevenly distributed after randomization, even if the adjustment was not prespecified?”

GPT4o answer:

Dr. Stephen Senn would likely have reservations about adjusting for prognostic factors that appear to be unevenly distributed after randomization, especially if such adjustments were not prespecified. This topic touches on a key debate in causal inference and randomized controlled trials (RCTs), and Dr. Senn’s perspective on this issue tends to be more conservative compared to some other statisticians, including Miguel Hernán.

Dr. Stephen Senn’s View on Post-Randomization Adjustments

  1. Randomization and Its Purpose: Dr. Senn places significant emphasis on the role of randomization in RCTs, viewing it as a mechanism that ensures treatment groups are, on average, comparable across both observed and unobserved factors. In his view, the primary justification for randomization is that it allows researchers to draw valid causal inferences without the need for post-hoc adjustments for imbalances. Therefore, Senn tends to be skeptical of the need for post-randomization adjustments for prognostic factors that were not prespecified, especially when the imbalance could have arisen due to random chance.
  2. Overfitting and the Risk of Bias: One of Dr. Senn’s key concerns is that adjusting for post-randomization imbalances can lead to overfitting, particularly when the factors involved are not strongly predictive of the outcome. He often emphasizes that such adjustments can introduce modeling bias, especially if the adjustment is not clearly justified by the study’s design or hypotheses. By adjusting for variables that were not prespecified, there is a risk of making the analysis more data-driven than necessary, and potentially introducing bias, rather than reducing it.
  3. Adjustments for Confounding vs. Prognostic Factors: Senn differentiates between adjustments for confounders(factors that are related to both the treatment and the outcome) and adjustments for prognostic factors (factors that are related to the outcome but not the treatment). In his view, adjustment for prognostic factors post-randomization—especially when they were not part of the prespecified analysis plan—can complicate the interpretation of the results. If the prognostic factor is not a confounder, then adjusting for it might unnecessarily reduce the variance explained by treatment, leading to potential misleading conclusions.
  4. Prespecification: Senn is a strong advocate for prespecifying all aspects of the analysis, including which variables will be adjusted for. This approach minimizes the risk of data-driven decisions and helps preserve the integrity of the trial’s statistical analysis. In contrast, adjusting for unplanned prognostic factors can open the door to post-hoc modifications that may appear statistically significant but lack external validity or relevance to the original research question.

Miguel Hernán’s View on Adjusting for Post-Randomization Imbalances

Miguel Hernán, on the other hand, is a proponent of adjusting for imbalances in prognostic factors when they arise in randomized trials, even if the adjustment was not prespecified. Hernán and other causal inference theorists, particularly those who adopt a counterfactual framework (such as those influenced by Judea Pearl’s work), often argue that adjusting for imbalances in prognostic factors can improve the precision of estimates and reduce bias, particularly if those factors are strong predictors of the outcome.

HernĂĄn might argue that even if the adjustment was not prespecified, it can still be justified as long as the adjustment improves the validity of the treatment effect estimate. His view is more in line with the idea that randomization reduces but does not necessarily eliminate the risk of bias from imbalanced prognostic factors. Therefore, adjusting for such factors can be seen as a way to improve causal estimates without necessarily violating the randomization assumption, as long as the analysis is done transparently.

Where Senn and HernĂĄn Diverge

  1. Flexibility of Adjustment: Hernán’s perspective generally embraces more flexibility in post-randomization adjustments, as long as these adjustments are made transparently and with a clear understanding of the potential implications. Senn, however, would likely argue that such adjustments should be avoided unless there is a very strong justification for them, particularly when the adjustments are not prespecified.
  2. Modeling Risk: Senn would likely be more concerned about the potential for overfitting or introducing spurious results by making unsystematic adjustments. He often emphasizes that randomization, in combination with proper statistical analysis, is sufficient for drawing causal conclusions in most cases, without the need for post-randomization adjustments unless the imbalances are substantial and theoretically justified.
  3. Interpretation of Treatment Effect: Hernán’s approach would be more comfortable with adjusting for prognostic factors if it can improve the precision and validity of the treatment effect estimate, even post-randomization. Senn, on the other hand, would emphasize the importance of sticking to prespecified analyses to avoid introducing bias through arbitrary adjustments.

Dr. Senn’s Likely Stance on This Issue

Given Senn’s cautious approach to post-randomization adjustments and his preference for prespecified analyses, he would likely not agree with Hernán’s more flexible stance on adjusting for post-randomization imbalances unless there is a very strong reason to do so. Senn would advocate for caution and transparency, warning against the temptation to make post-hoc adjustments to prognostic factors that were not planned in advance, as such adjustments can undermine the internal validity of the study and lead to incorrect conclusions.

Conclusion

In summary, Dr. Stephen Senn is likely to disagree with Miguel Hernán on the matter of adjusting for prognostic factors post-randomization if the adjustment was not prespecified. Senn would typically argue that post-hoc adjustments, especially for factors that were not originally identified as part of the analysis plan, can introduce bias and compromise the integrity of the study’s causal conclusions. In contrast, Hernán would be more inclined to adjust for imbalances in prognostic factors if it improves the precision and validity of the treatment effect, even if the adjustment was not prespecified.

Happy Holidays :nerd_face:

1 Like

Love this. Their reading comprehension skills and creativity is remarkable. Assuming Hernan truly believes the above, I side with Senn on this one :slight_smile:

1 Like

The following article may be of interest (anyone who cannot download a copy can get it from me directly). The main focus is on details of random confounding (written with denialists in mind) which had been illustrated in Greenland & Robins IJE 1986 and Greenland Robins, Pearl Stat Sci 1999, but which is invisible in standard DAG and naive (unconditional) frequentist formalisms:
Greenland S, Mansournia MA (2015). Limitations of individual causal models, causal graphs, and ignorability assumptions, as illustrated by random confounding and design unfaithfulness. European Journal of Epidemiology, 30, 1101-1110.

Basically, denial of the existence of random confounding is akin to the confusion of “long-run” frequency behavior with individual cases, as seen in significance testing when p>0.05 is confused with “no observed association”. Both confusions cause blindness to actual effects in single cases by treating hypothetical averages as if they were realized in the single-case data one has. In reality, identity between those averages and a single-case outcome is an event that has nearly zero probability of happening under randomization, and the difference can be checked with measured-covariate information.

Not unlike sequential analysis and preliminary model selection, adjustment prespecification brings up matters of intentionality and the “correct” frequentist reference distribution. Logical analysis of such issues requires formalizing fine details which are usually left implicit and informal. Such details are what separate Hernan and Senn to the extent they are separated; I suspect however none of us would disagree if indeed we were locked in a room and forced to explicate all those details together. In fact the details have been addressed in various papers; in addition to the illustrations mentioned above, Cornfield AJE 1976 gave an illuminating Bayesian description of what randomization does for single cases in terms of rexing - rendering exchangeable priors, at
https://academic.oup.com/aje/article-abstract/104/4/408/48842
Robins & Morgenstern 1987 gave a detailed explanation in terms of frequentist conditionality, at
https://www.sciencedirect.com/science/article/pii/0898122187902367

Alas, as Erin lamented, the details have not been pulled together in a review that is both accurate and researcher-friendly, that I know of. It thus may be no surprise that some of the comments about this topic I have seen elsewhere are unaware of the key results; some even make blatantly erroneous claims, especially now with blogs on the topic popping up. An example is that adjustment for baseline covariates in an RCT should be avoided because it will imbalance the groups; here, the observed covariate imbalances are single-case events, but are being mixed in the same equation with balance from randomization which is only a long-run average (in logical terms, this mixing is a category error). To address these kind of confusions, Ian Shrier at McGill is trying to pull a group together to do such a review.

6 Likes

Thanks very much for your input, Dr.Greenland.

“Such details are what separate Hernan and Senn to the extent they are separated; I suspect however none of us would disagree if indeed we were locked in a room and forced to explicate all those details together.”

Are you saying that ChatGPT has the wrong end of the stick? I don’t trust my own reading comprehension on statistical issues, but I will say, as a layperson, that ChatGPT has flagged a discrepancy that I too perceived and have found very confusing.

Maybe a more concrete example would help. On page 138 (section 10.4) of the updated edition of “Causal Inference- What If,” (see link below) the authors (Hernan and Robins) present a hypothetical scenario in which smoking status is discovered, after the fact, to have been unevenly distributed between treatment arms of an RCT. They explore the potential impact of this maldistribution (this discussion is bewildering to me as a non-expert) and conclude that the RCT investigators should retract their paper. However, I’m not at all sure that Stephen Senn would have provided the same advice to the investigators (but maybe I’m wrong) (?)

Since it’s quite plausible for scenarios like this to arise during the analysis phase of potentially practice-changing medical RCTs (in fact, I can think of one example that fits the bill), it’s vital that top experts agree on how they should be resolved, and why. Failure of statistical experts to reach consensus on such questions will lead to never-ending second-guessing from practitioners regarding trial interpretation and actionability.

https://static1.squarespace.com/static/675db8b0dd37046447128f5f/t/6768776ea7dfa22610b4ed39/1734899569480/hernanrobins_WhatIf_22dec24.pdf

2 Likes

Indeed, a cross-disciplinary review focused on this topic will be helpful, particularly as it is generating increasingly more interest nowadays. Richard Peto also gave a nice related talk here on what he called “regression dilution” whereby a predictor of outcome heterogeneity is measured with random error or varies with time influencing its association with the outcome. His input could be valuable in such a review.

A related topic where integration is urgently needed between statistics and epidemiology (and computer science and other data scientists) is estimating in RCTs the impact of subsequent therapies in long-term outcomes such as overall survival. There is currently no consensus on best practices to do this among oncology trialists and regulators, and in fact up until recently most oncology trialists did not even recognize the challenge. But there is little doubt that in order to tackle it, trialists will have to swallow their pride and learn, e.g., from epidemiology. To facilitate these conversations in oncology, the Kidney Cancer Association organized in 2023 a Think Tank which led to this just published statement (the appendix goes a little deeper into methodology considerations). The Think Tank conversations were far more productive than one would anticipate based on the typical kidney cancer trial discussions on twitter/X. The experience was a very nice example of how open practitioners can be to these efforts, at least under the right conditions.

1 Like

After reading these experienced replies, I have a simple question (well I can’t fully understand these complex contents, I am still learning). My understanding is that the discussion now is about whether randomization can make the prognosis of the two groups equal, if not, multivariate adjustment is needed during analysis. The conclusion is that randomization cannot completely make the prognosis of the two groups equal, and multivariate adjustment is needed during analysis. Is my understanding correct?

1 Like

Correct. Randomization makes prognosis equal in expectation (on average across multiple hypothetical repetitions of the RCT) but not in realization for any given RCT. For this reason multivariable adjustment increases power and precision in RCTs both in theory and in practice as we showed here for phase 3 RCTs in oncology.

This key point is less well known in academia compared to industry, hence why clinical journals rarely bring it up. Notice how in our analysis industry RCTs had more than twice higher odds of using multivariable adjustments for their primary endpoint analyses. Part of the motivation for us to generate these data was to highlight this point in the academic oncology literature.

Regarding adjustment for imbalances after the fact, the simplest argument for not doing this is that the set of baseline covariates is ill-defined. Where do you stop? I showed in a simple simulation of one example setup that if you find an imbalance on one factor you have to look at just 7 more factors to expect to found a counter-balance.

4 Likes

“The set of baseline covariates is ill-defined”

I don’t think that’s true in general. There are many ways to define it as precisely as needed for post-trial usage, e.g., as the set of covariates recorded in the trial database at the time treatment was assigned. As for where to stop selection for adjustment, that can be operationalized to optimize performance according to explicit criteria, and could entail partial inclusion as in penalized (shrinkage) estimation of covariate coefficients.

Defining baseline covariates and determining how to use them are examples in which logical analysis requires formalizing fine details which are usually left implicit and informal. Sensible ways to formalize and answer the questions precisely will depend on contextual details. That means no single approach to defining and using covariates will be judged adequate for all situations by all methodologists. As with regression analysis in general, we can only provide catalogs of models or templates for defining and selecting covariates, along with examples. For the more sophisticated user we can also provide tools for generating models tailored to explicit contextual details.

Randomized trials present problems no different and often less severe than the general problems encountered in use of statistics and regression for estimating causal effects. Randomization provides a known truly instrumental variable (treatment assignment) to deal with confounding of received-treatment effects. Plus RCTs are usually conducted with more careful and focused data collection and follow-up, reducing risks of selection bias and mismeasurement. These advantages come at a cost of reduced transportability, plus high per-person expense and thus reduced sample size which makes variable-selection problems more difficult. But those problems have already been studied in depth in the general regression literature, and the methods used to deal with them apply to RCTs.

Why then don’t we see those methods deployed in this debate? I think sample size is one reason, as a lot of methods are asymptotic and a lot of trials are small. But I think the biggest reason is the one that started this thread: Most researchers don’t have the technical background to understand ideal methods, and often don’t even have a correct grasp of what randomization does and doesn’t do. Some are even prone to “cheat” (maybe without knowing it) via post-hoc selection toward more desirable results.

I think that’s why front-line statisticians have promoted simple methods that seem safe for general use, e.g., lay out a pre-specified list of baseline adjustment covariates and limit adjustment to that. That’s fine, as long as those simple methods aren’t promoted as if they were the only possible correct approach, never to be violated in any circumstance (as happened with NHST), because there is no such thing.

4 Likes