When is it clinically reasonable to assume transportability of RCT effects?

A commentary on a 2023 article by Michael Bailey discussed transportability of RCT effects (1). Here is an excerpt:

“Generalizability here refers to the extension of inferences from the RCT sample to a population of patients that either matches or is a subset of the population of patients eligible for the trial. Conversely, the term transportability is used to describe the extension of knowledge gained from the RCT to populations that were not necessarily represented in the enrolled trial sample…A common strategy is to assume transportability from RCT samples to future patients who share the biological causal properties that mediated the comparative treatment effect…

I’m trying to reconcile the bolded phrase above with comments that were made in another datamethods thread:

“…In fact we should expect no measure to be portable. That is because in health and medical settings there is no scientific (causal mechanistic) or mathematical reason any measure should be constant across settings , and many reasons to expect all measures to vary, given all the expected differences in patient selection and effect modifiers across trial settings…

“…marginal causal ORs (mcORs) need not be any kind of average of the covariate-specific causal ORs (ccORs), and with common outcomes will often fall entirely outside the range of the ccORs…Claims that this favors the ccORs based on generalizability or portability miss the fact that there are always unmeasured strong outcome-predictive covariates and these will vary across studies and settings (not only in distribution but also in which ones remain uncontrolled), making the ccORs vary across settings as well - often more so than the mcORs…”

“…In reality our models are always wrong and missing strong outcome predictors, so if the outcome is common then noncollapsibility will be a problem for predicting effects whether we know about the problematic unmeasured covariates or not…”*


Without rehashing the debate over the merits of various effect measures, to what extent do others agree with the bolded statements from the other thread (which I interpret to mean that effect modification is so common that we should expect effects, not just effect measures, to vary from setting to setting)? Specifically, would physician scientists agree that our default assumption, in the context of most pharmaceutical company-sponsored clinical trials, should be that there will be covariates that will be strongly predictive of outcome yet unknown at the time the trial is designed?

  1. Msaouel, P. (2023). The Role of Sampling in Medicine. Harvard Data Science Review, 5 (3). https://doi.org/10.1162/99608f92.bc6818d3

Absolutely, for both industry-sponsored and investigator-initiated trials. This may be the number one consideration I have in all the clinical trials I design (along with the assumption that nothing will go as expected) and relates to the recommendation here to always incorporate careful collection of patient samples to be integrated with translational and reverse translational pipelines to identify previously unknown sources of outcome heterogeneity. This prepares us for the next generation of clinical trials.

The topic of transportability in the face of treatment effect heterogeneity is rapidly evolving (great recent JAMA articles on the latter here and here) but one way to approach it clinically may be to consider which conditions would refute our transportability assumptions. For example, the effects of a drug targeting the cell surface EGFR receptor should not be transportable in patients lacking this receptor. When this assumption is wrong we actually learn more.

These considerations can apply not only with regards to transportability from RCTs to the clinic but also in the perhaps even more challenging “mouse-to-man” assumptions. Here (from 5:00 onwards) is a real-life example of a transportable mouse experiment that allowed us to rapidly repurpose clinically a therapy that saved the life of a patient with a rare and deadly kidney cancer, and has now been successfully used across the United States and Canada.


Thanks for your input Pavlos. I think I might not be asking my question in quite the right way.

In oncology, there’s no question that it’s important to know a lot about a given patient’s tumour, since tumour markers can be quite different between patients and a drug known to target a certain marker won’t be expected to work in a patient whose tumour lacks that marker. But at the same time, you wouldn’t go around enrolling a group of patients without the EGFR receptor into a trial of a drug that targets this receptor, right? You would have done all the necessary lab work ahead of time, studying lots of tumours in animals and seeing how they react to the drug in question. You’d go to a lot of trouble to try to figure out which patients are even capable of responding to the drug (based on its known mechanism of action) before running the trial. And if the drug then seemed to work in your clinical trial, you wouldn’t really have any reason to think it might not work in a patient with an EGFR positive tumour who might not have been eligible for inclusion in the trial for a reason unrelated to the presence/absence of EGFR (e.g., a wheelchair-bound patient who didn’t think he’d be able to attend all the rigorous followup/testing required to participate in a clinical trial) (?)

Another example: Trials showed improved outcomes for patients with STEMI who underwent timely primary PCI versus thrombolysis. Hypothetically, if patients with cognitive impairment had been excluded from the trials for some reason, we wouldn’t have any reason not to assume that primary PCI might still be preferable to thrombolysis in patients with cognitive impairment. Since the mechanism for STEMI is the same for nearly all patients, we think it’s reasonable to assume that a treatment that targets that mechanism should work, regardless of the patient’s underlying characteristics.

So I guess what I’m asking is why we would expect a lot of important effect modifiers to be present in a typical clinical trial, since so much work (in most areas) goes into matching the treatment to the causal mechanism of disease development in pre-clinical and early phase trials (?)


I think a lot about “all models are wrong” and view models, when carefully chosen and fitted, as highly useful approximations of reality. When a model that allows for treatment effect homogeneity finds effect heterogeneity, I come close to believing it. As a practical matter, covariate-adjusted relative efficacy estimates tend to be more or less transportable. There is nothing magical about odds ratios, for example, but since they are unrestricted (unlike absolute risk reduction) they tend to work over the widest variety of patient types, for most studies. You can do better when estimating treatment benefit for a given patient, in terms of bias, but you’ll seldom do better in terms of mean squared error (bias squared + variance) when basic estimates on things like odds ratios, coupled with baseline risk.


Completely agree. Your practical pearls have been highly influential in our practice. You may enjoy a preprint we just uploaded that generates an empirical prior for phase 3 RCTs in oncology that then allows us to estimate posterior estimates for their hazard ratio estimates using a simple free webtool by simply typing in the reported hazard ratio and 95% confidence interval estimates. Related work here in terms of mean squared error and coverage of this approach.

These adjusted hazard ratio estimates can then be plugged in calculators such as this one developed by an outstanding kidney cancer patient advocate to generate absolute risk reduction estimates by incorporating nomograms of baseline risk. The question, of course, is whether we can transport the hazard ratio estimates from the RCT to the clinic and we provide here some thoughts on the topic using the same motivating scenario of adjuvant therapy for renal cell carcinoma.

This is a challenge we face regularly, including in a discussion within our group yesterday. Lab experiments allow far more control of the setup. Things get messy every time humans are introduced in the equation :slight_smile:

While we do want to enrich for patients that would harbor the causal mechanisms of interest, there are many contrasting reasons in favor of broadening eligibility (none of which are necessarily related to representativeness of course), including patient access to potentially beneficial new therapies, faster accrual, inclusiveness etc. Finding the right balance within these trade-offs is the holy grail of trial design.


Wow Pavlos. Wonderful work.

1 Like

Thanks. Sorry if my question seems muddled- probably a reflection of my muddled thinking. I’m mainly just stuck on the following statement, quoted in the original post:

there are always unmeasured strong outcome-predictive covariates and these will vary across studies and settings.

Since this statement seems to be used as a premise for other arguments about the pros/cons of various effect measures, it seems important to establish whether there’s universal agreement with the premise.

If the term “outcome predictive covariates” is referring to effect modifiers, then the statement is suggesting that, when we test and approve a therapy, we don’t thoroughly understand the link between disease development and a therapy’s mechanism of action. My discomfort with the statement stems from the fact that I think there are many clinical situations where we do understand therapy/disease mechanisms of action well enough to reasonably assume absence of effect modification across settings. Primary PCI for STEMI is one example. In contrast, we might have grounds to question the transportability of the RCT effect of, say, a new antidepressant in patients with “treatment-refractory depression.” Whereas our understanding of the causal mechanism for STEMI is advanced/complete, we’re not nearly as close to understanding the causal mechanisms underlying development of depression. And a not-insignificant portion of patients in a primary care setting with treatment-resistant depression might actually have undiagnosed bipolar illness, which might respond unfavourably to the new antidepressant.

I guess what I’m trying to say is that, clinically (except, maybe, in fields like oncology), we very often assume transportability of an RCT effect, if we know that an approved therapy targets a well-understood causal mechanism for a disease. For diseases whose development we understand well, this assumption is usually the best bet we can make for the average patient.


Yup. That is fair to say and it is in the spirit of “all models are wrong” and “bias-variance trade-off” noted by Frank above.