RCT generalizability and interactions with treatment

This topic is for discussions and questions surrounding the blog article Implications of Interactions in Treatment Comparisons. One of the themes of the article is that randomized clinical trial results can generalize quite well to a much different target patient population, depending on the true underlying interaction patterns and the assumptions you make about them.


Hi Frank!

Thank you for another informative blog post! As a clinician one encounters this argument quite often and it is great to do some actual thinking about it. To me, it sounds like the conclusion of your post and simulations is that if your model captures the underlying data-generating process well then extrapolation is not an issue. This is also quite intuitive, one could say that this is one of the purposes of creating a model in the first place.

However, the big practical question is how to decide about the interactions. From your simulations it seems to me that assuming an interaction when there isn´t one and vice versa are both impactful mistakes (as opposed to say choosing a non-parametric test when there is in fact linearity). Without any in-depth subject matter expertise, my understanding is that very few RCTs detect significant HTE/interactions but almost all are very underpowered to do so, as you also allude to. How can we than claim that most results can be generalized, if we don´t really know how prevalent interactions are?



I’m glad you liked the post. You raised some excellent issues. To your all-important practical question, my initial thought is that we don’t have to be perfect in order to make progress. The question about which interactions to pre-specify is a problem for all randomized trials, even those done on a general highly representative target sample. It’s just that we’re not used to talking about it very much.

To your last part:

it is worth noting that the vast array of published forest plots of odds ratios and hazard ratios do not back up the notion that HTE/interactions are prevalent. (These are not very precise within one clinical trial but looking at them over a large collection of trials is useful.)

I fall back to how I teach model pre-specification where I’ve listed a set of interaction types that make sense to pre-specify. See for example Section 2.7.2 of RMS. Of all the interaction types discussed there, treatment \times severity of disease is one to most seriously consider pre-specifying.

As I read the post, Carl Sagan’s saying “extraordinary claims require extraordinary evidence” came to mind. While “extraordinary” is surely too strong a word here, it seems to me Frank that your argument has something of this spirit to it. At least, it seems that the panoply of forest plots you invoke shifts the burden back onto those who would treat HTE as the default assumption.

This discussion is very timely, in light of recently issued FDA Draft Guidance on Inclusion of Older Adults in Cancer Clinical Trials:

Whereas considerations of equity and justice seem to me the most powerful motivations for including under-represented groups in trials, appeals for broader inclusiveness are usually couched in terms of representativeness and generalizability. But these latter considerations would seem (in light of Frank’s post) to be of secondary importance.

Frank’s post does explicitly exclude benefit-risk considerations, however, and indeed this new Draft Guidance emphasizes precisely this type of question. Also, it is my impression that Frank’s argument applies mainly to the inferential function of trials, and less (or not at all?) to their ‘hypothesis-generating’ aspects. But still I wonder if some recommendations in this Draft Guidance might be improved by a confrontation with Frank’s argument.

1 Like

Thanks for the note David and for the FDA reference.

I’d like to see more development in this area. When one includes trade-offs from side-effects, several issues arise:

  • The study design and sampling needed to estimate side effect incidence may differ from the design and sampling for optimally and feasibly estimating relative efficacy.
  • Interactions may be different across the two classes of outcomes, and we may find more interactions for side effects than for efficacy outcomes in many cases.
  • When there are no interactions with side effects (e.g., the treatment B:A odds ratio for a side effect does not vary with age) one can use fairly simple models to estimate net clinical benefit
  • Some side effects should be included as bad efficacy outcomes; then the problem is translated into something such as non-proportional odds for the treatment effect, i.e., treatment B may affect an adverse event incidence more than it affects the main efficacy outcome in an ordinal model. (Note that lumping all events into a binary union of events merely hides the problem.)
1 Like

I agree with this conjecture. Human variation with respect to values, circumstances, culture, etc. probably far exceeds variation in physiology. While we might all agree in evaluating ‘scalar’ differences on a single dimension of treatment benefit (longer life, less pain, etc.), as soon as you get ‘vector’ (multivariate) outcomes that engage benefit-risk tradeoffs, cosequential interactions with our highly variable values seem inevitable.

1 Like

incidentally, euro medicines agency says risk benefit (net benefit) composites are not usually accepted as primary outcomes. Makes sense: I don’t like the “trading-off” hidden within the mechanism of the calculation. Outside industry such composites are used as primary however

1 Like

I think that if the trade-offs don’t make unwarranted assumptions, they should be allowed. For example, assuming a ranking of outcomes but not making an assumption about their spacings (i.e., assuming an ordinal but not an interval scale for Y) should be OK.

As a pediatrician I am interested in extrapolation to the other end of the age spectrum. According to this 2019 review, age-treatment subgroup analyses are and can very rarely be performed, which supports Franks argument that this is an area which requires more focus. The FDA has also updated its recommendations for drug development in pediatrics in 2018 where they for example specifically recommend modelling and simulations strategies to optimize clinical trails - see here.
As @davidcnorrismd points out, the available data to support the lack of HTE in general is weak at best (especially in pediatrics where most scenarios would involve the no overlap cases from Frank´s simulations) and thus considering this as the default assumption might have a number of unintended consequences. Currently, some 80% of pediatric drug use is off-label which makes us pediatricians very careful about our choices of drugs, formulations and dosages. If the standard would be some sort of extrapolation from adult data at the regulatory stage (which is currently done instead at the end-user level) and many of these drugs would become on-label, it might create a false sense of safety and certainty and consequently lead to more errors.