Generalizability vs. Transportability in Trials

Alessandro_Rovetta · December 3, 2025, 10:36pm

Hello everyone,

Our group is working on a paper that revisits the separation between generalizability and transportability in clinical trials. We also debate what representativeness should mean in this context, especially when moving beyond demographic similarity toward the preservation of causal attributes and mechanisms. Below is the abstract of our manuscript. This is an initial version, and we would be very thankful for feedback from the community.

We contend that the usual separation between generalizability and transportability is essentially artificial. Since trial participants are shaped by specific selection processes and thus are almost never randomly drawn from the target population, any extension of trial results inevitably begins as a transportability problem. Consequently, trial estimates should not be viewed as pure biological constants; rather, they reflect a joint intervention composed of three elements: the targeted biological effect, the operational choices and actions that generated the observed data (e.g., patient recruitment, monitoring, adherence support), and the analytical choices and actions that generated the result (e.g., variable definitions, model specifications, handling of missing data).

This perspective also redefines representativeness: it does not require demographic similarity but the preservation of the causal attributes, mechanisms, and contextual conditions necessary for the intervention to exert its effect. In set-theoretic terms, proper inference depends on the intersection of the target structure, the study context, and the mechanism present in the sample. Because trial results arise from a constrained “fictional small world”, they should not be treated as absolute truths about reality. We therefore advocate for transparent reporting of analytical choices, explicit causal assumptions, and compatibility-based interpretations rather than dichotomous significance testing. In this regard, viewing transportability as a theoretical and mechanistic challenge can provide a more realistic foundation for extending trial findings to clinical practice.

f2harrell · December 4, 2025, 6:08pm

I very much like what you’ve written. At some later point you may want to tie this into choice of scales (absolute risk, life expectancy, log odds, log hazard, etc.). There will always be a scale for which the results are not transportable, and there may be a scale for which they are transportable.

ESMD · December 4, 2025, 6:35pm

This is really great! I love the fact that you have worked so hard to find words to explain these concepts, rather than relying on math (a language that very few non-academics or clinicians will understand). And Figure 1 is terrific as a simple yet powerful centrepiece to convey the key message in the article.

I’m not sure whether any of the feedback below will be useful. These are just some observations from a non-expert trying to understand your message. Addressing them might lead you into weeds you didn’t want to venture into with a short high-level/conceptual article.

In your introduction you write (bolding is mine):

“Dahabreh and Hernán distinguished generalizability, which refers to extension to a target population nested within the eligible population, from transportability, which concerns extension to populations that include individuals outside the eligible set [1]. However, inclusion criteria merely define the potential covariate space, the boundaries within which the trial is conducted, but they do not ensure that this space is “filled” in a way that represents the target population’s distribution. The common notion that inclusion criteria are the basis for generalizing results is thus erroneous [2].”

Then in your Conclusion, you write:

“We argue that the distinction between generalizability and transportability is largely artificial.”

Questions:

Is there a need to discuss (or compare/contrast) how these terms have historically been explained/defined in the context of clinical observational research versus RCTs?
Could you include a simple picture to show the “target” population’s relationship to the “eligible” population (to more clearly convey how these terms have been used historically, both in the observational and RCT context)?
The fact that the second sentence in the first excerpt above starts with “However” suggests the existence of a disagreement. If you are, in fact, disagreeing with how these concepts have been presented historically, then your wording doesn’t feel explicit enough to convey this fact. If you do disagree, is this difference of opinion one of the main issues you’re trying to highlight with this paper?

“The common notion that inclusion criteria are the basis for generalizing results is thus erroneous [2].”

Question:

Have you managed to establish when/where this notion first arose? Did it arise in the field of epidemiology or statistics? What were the possible reasons for propagation of this misunderstanding? Remarking that an error is widespread without addressing its origin is like lopping the crown off of a rotten tooth to fix a toothache- the pain will persist. You also have to identify and extract/obliterate the root…

“Selection processes spanning screening, consent, and retention systematically distort the sample’s covariate distribution relative to the target. Furthermore, trials evaluate a joint intervention: the treatment combined with the rigid monitoring, adherence support, and protocol-driven care inherent to the research setting [5-7].”

Here it might be helpful to provide an example of a clinical scenario in which contextual factors associated with administration of a therapy in a trial would be expected to be very important mediators (not sure of the right word to use here?) of the effect of the therapy in a non-trial setting.

Example: Trials that showed efficacy of IV thrombolysis for acute ischemic stroke involved complex protocols that were designed to optimize 1) the accuracy of early stroke diagnosis (particularly, investigators’ ability to differentiate stroke from stroke “mimics”) and 2) the timeliness of thrombolysis relative to symptom onset. In order for a thrombolytic’s biologic efficacy signal to “translate” to stroke patients presenting outside the clinical trial context, contextual factors 1) and 2) must also translate to post-trial settings. If physicians outside the clinical trial context lack sufficient training to accurately diagnose early stroke OR if they are unable to initiate thrombolysis in a timely manner, then the efficacy of thrombolysis will not manifest outside the context of the clinical trials.

Thanks for sharing your article.

Alessandro_Rovetta · December 5, 2025, 1:15pm

Thank you very much for your wonderful comments, @ESMD and @f2harrell . My colleagues and I are working on a new version that reflects your feedback. As soon as it is ready, we will share it again.

Johannes_Schwenke · December 5, 2025, 2:08pm

Nice. I always viewed a distinction between generalizability and transportability as artificial.

I prefer the formalization as done in this paper, though I think the conclusion are quite similar. Definitely a paper you should discuss, imho !

karlamoPA · December 8, 2025, 12:30pm

Thank you for this effort. As a patient advocate I see it as a consent issue - the importance of appreciating the limitations of studies - and therefore that the language used should be geared for patient understanding.

Was the recruited population for a treatment I’m considering like or unlike me in some important way? In many cancer studies the median age of the study population is significantly lower than the afflicted population. I once proposed in a conference on eligibility that our clinical studies should continue beyond a preset size - remain open but limited to participants of older age … until the age of the afflicted in the general population was adequately represented. The pushback … as expected: that would add to the cost and delay the conduct of the trial - not a practical solution. So we are left to use imperfect trials hopefully with a better appreciation of their limitations.

f2harrell · December 8, 2025, 1:44pm

Your wording implies you are worried about generalizability from the standpoint of main effects when in fact it only matters with respect to interactions (differential treatment effect). Mean age can be arbitrarily different between RCT and clinical practice and not cause a problem if age and treatment have zero interaction in relationship to the outcome measure.

karlamoPA · December 8, 2025, 2:12pm

Thanks for responding. So for cancer therapies, where more toxicities are expected and accepted because of the risk of the disease, age significantly increases the risk of toxicity …. leading to higher rates of side effects. These may not be fully appreciated by the patient and referring doctor when the study population in the study relied upon is much lower than the afflicted population. So the main effect of concern for advocates and patients are the side effects and also the impact of these on quality of life (the oft left-out endpoint that’s integral to clinical benefit) – particularly for treatments given until progression or unacceptable levels of toxicity.

Pavlos_Msaouel · December 8, 2025, 2:52pm

This is exactly the misunderstood point across oncology and age is the perfect example to educate stakeholders about it. The good news is that since the time of this related commentary, there are far more people that have been made aware of these nuances and are working on implementations for the benefit our patients. Take-home message: enrolling older patients in RCTs is typically neither necessary nor sufficient to make inferences about a therapy’s effect in older patients. Thus, in scenarios where there is pushback due to cost and other logistics, there are other strategies that can instead be used. Here is our initial demonstration project of how these transportability considerations can be used to help inform adjuvant pembrolizumab inferences and decisions for kidney cancer. Very happy to see these ideas being refined and scaled up.

This does not mean that representative RCT data are not welcomed when available. Here are additional arguments in favor of doing so. And here is a utility-based approach we developed to exactly tailor clinical decisions based on efficacy/toxicity trade-offs for older versus younger patients using RCT data. Scaling such approaches up is the main challenge. Educating stakeholders is a key step towards this goal.

f2harrell · December 8, 2025, 5:32pm

In terms of model specification one could have an ordinal outcome model with a special effect of age (non-proportional odds) for the toxicity portion of the outcome scale if age is thought to affect toxicities more than tumors/death.

Lawrence_Lynn · December 9, 2025, 3:01am

This is great. Consider discussing the two types of RCT. 1. The cause-integrity RCT(CIR) which have remained close to the original Bradford Hill method and 2. The cause-agnostic RCT (CAR).

Obviously there is a spectrum between the extreme of Bradford Hill CIR and the synthetic syndrome CAR.

The CAR provides the best example of Beautiful internal validity with no transportability.

Johannes_Schwenke · December 9, 2025, 8:52am

I think it would also be worthwhile to have a discussion about ‘pragmatic’ trials and their often claimed enhanced generalizability.

An argument I’ve repeatedly seen is that we want fewer explanatory trials, refecing the classic Schwartz and Lellouch paper from 1967. That for clinical decision making we need trials that directly address a specific decision, i.e., are not placebo controlled (because placebos don’t exist in usual care), are not ‘artificial’ etc. Of course representativeness is also often mentioned. I’ve seen people claim that we just need to know what works, not how.

The problem is of course that we always need to generalize, mostly based on theory, and for that theory we need causal understanding. In pragmatic trials there are far more ingredients that might cause a difference between groups at the end of the trial than just the drug, but we don’t know or measure them, so our causal understanding is likely very poor. This leads me to the tentative and paradoxical conclusion that pragmatic trials might often have lower generalizability than explanatory trials.

Wrote this on my phone, will try to provide some refs later.

Pavlos_Msaouel · December 9, 2025, 11:39am

Correct. Glad you are able to see this. One trade-off is that the outcome heterogeneity of a pragmatic trial can require higher sample sizes to estimate the causal effect of interest unless we carefully covariate adjust. A second trade-off is that for the purposes of generalizability/transportability, as shown also in Figure 1 here, even if the enrolled patients of a pragmatic trial end up being representative of a population of interest we would need to actually model the effect of their covariates if we truly think they moderate (mediate) the causal effect we wish to transport (see figure 7 and section 3 “Causal Modeling of Treatment Effect Heterogeneity” here). That also requires higher sample sizes. There is no free lunch.

Lawrence_Lynn · December 9, 2025, 2:44pm

Formal Deletion Pending

elivingston · December 10, 2025, 10:00pm

How would you set that up in practice?

elivingston · December 10, 2025, 10:07pm

Just as the patients included in a trial are a sample affected by trial inclusion and exclusion criteria and, therefore, applicable only to that subset of patients, the abstract you present will only attract readers familiar with portability and generalizability. The rest of the population of potential readers will not read the paper.

I suggest defining the terms/concepts portability and generalizability in the first 3 sentences of the abstract and introduction to draw in readers who do not know much about these ideas but want to learn about them.

f2harrell · December 11, 2025, 2:15pm

What does that mean Ed?

elivingston · December 11, 2025, 2:47pm

The way I see it, RCTs only reliably inform clinicians about treatment effect of an intervention in the subset of patients who meet inclusion and exclusion criteria.

For example, there is a universe of patients who have hypertension. If an RCT testing an intervention excludes patients older than 70, its results don’t tell you anything about the subset of older patients in the universe of patients with hypertension.

Does that make sense?

Pavlos_Msaouel · December 11, 2025, 2:57pm

Nope. Although it is hard to see, a major point of this article (freely available and with practical clinical examples at the end) is to show why the above is incorrect. It is however super long and tailored to oncology. This was strategic: the first goal was to convince methodologists adjacent to oncology followed by quantitatively minded clinicians, who would subsequently keep elaborating in articles, lectures, and discussions that can then diffuse. It has worked surprisingly well within our ecosystem with some collaborators now saying they have been red-pilled by these efforts

Johannes_Schwenke · December 11, 2025, 4:13pm

Besides all the technicalities wonderfully explained in @Pavlos_Msaouel articles, @Stephen once put it very succinctly. “You cannot sample from the future.” Environmental factors change, distributions of comorbidities change, co-medications change, etc.

If if a patient’s age equals the median age of a sample from an RCT, all of the above would still be different than when the trial took place. So we still have to generalize (transport) the results from the trial to the patient in front of us, and make assumptions, based on our understanding of the world, on whether it’s important, for example, that the patient is now also on a new drug that did not exist back when the RCT took place, or that winters are now on average warmer, or whatever.

Whether it’s important that a hypothetical patient could not participate in a trial, because they are 71, would also have to be based on theory. And I would assume that for most interventions, it’s very unlikely that effect modification suddenly happens after a certain age cut-off.

tldr: It’s an impossibly high standard, which if consistently applied, would not allow one to use information from pretty much any trial to inform treatment decision.