Analysis of 2x2 trial with anticipated treatment interaction

Hi all, I an in search of resources/guidelines for the principled analysis of a four arm clinical trial where the treatments are expected to interact. The four arms are: drug A, drug B, drug A+B, placebo. It is thought that drug A will potentiate B - thus A+B is anticipated to be the most effective treatment.

In my search so far one recommended strategy to analyze such a trial is to perform a series of NHSTs comparing:

  1. A, B and AB vs placebo;
  2. A+B vs max(A,B,placebo);
  3. treatment A vs B

At each step, a Bonferroni p value correction is undertaken to control the overall family-wise error rate (up to alpha / 6).

I am not entirely satisfied with this approach. Rather than measure the evidence against the null hypothesis of zero treatment effect, I would much prefer to estimate a range of treatment effects compatible with the data and interpret these clinically. Applying a Bonferroni p value correction does not help me if I am not aiming to perform NHST. Nor does it have any impact on the estimated treatment effects and compatibility intervals - and these are what I will interpret. Yet I do recognize that some form of multiplicity correction is required - but I hope that there may be a more efficient alternative. Logically specifying skeptical priors may be such an alternative - but I am uncertain if this is recommended practise.

If anyone would be able to provide me any suggestions or resources for such a situation, and any potential alternative strategies, I would be very grateful.


As you implied we really need a comprehensive guidance document covering this. There are many options including a perhaps more powerful test of association of X with Y where X = 0, 1, 2 indicating the number of active treatments received. I wouldn’t use any observed maximum. A very important issue is that without any prior information the interaction test has very low power and the individual contrasts have low precision. Some information borrowing is needed and the Bayesian approach is natural for this: see Bayesian design and analysis of two x two factorial clinical trials - PubMed. Also see

1 Like

I would encourage you to consider the interaction not as a pure statistical abstraction (e.g., as a generic term in an off-the-shelf regression equation), but in pharmacologically realistic terms. There are at least 3 distinct forms of ‘synergy’ I’ve encountered in my work on oncology dose-finding methods:

  • The wonderful (albeit deflating) Palmer-Sorger analysis [1] suggests that much combination therapy in oncology ‘works’ in the same manner as throwing 2 darts blindfolded instead of 1. (The exception they identify in their paper is also instructive.)
  • In a white paper [2] I explore a happier type of synergy, nicely described by Frei [3] (see quotation on p2 of [2]).
  • When the disease is capable of evolution (e.g.: virus, bacterium, cancer), combination therapy may avoid the emergence of resistance.

What form(s) of synergy are postulated for your own A+B? Could you possibly design a trial that specifically identifies (potentially, falsifies) that more sharply defined hypothesis?

  1. Palmer AC, Sorger PK. Combination Cancer Therapy Can Confer Benefit via Patient-to-Patient Variability without Drug Additivity or Synergy. Cell. 2017;171(7):1678-1691.e13. doi:10.1016/j.cell.2017.11.009

  2. Norris DC. Impeachment of One-Size-Fits-All Dosing for Obstruction of Synergism. Published online December 4, 2019. OSF | obstruction-of-synergy.pdf [2-minute video] [Tweetorial]

  3. Frei E. Curative cancer chemotherapy. Cancer Res. 1985;45(12 Pt 1):6523-6537. OA link

That is my concern. The study is certainly under-powered for an interaction test. The idea of including treatment as a continuous variable (0, 1, 2) is a potential solution. I don’t anticipate the treatments to be strictly additive, so could perhaps include a spline for the treatment variable. Since the outcome is assessed at multiple follow-up visits - that would amount to including an s(treatment)*time interaction. Potentially not very satisfying for readers accustomed to coefficients and 95% CIs rather than predicted values - but at least uses all available data. One problem is that it collapses any distinction between drug A and B (both X = 1).

Great questions and valuable resources here. This prompted me to think much more deeply about the expected treatment interaction. Biologically there is reason to believe the two treatments are synergistic beyond ‘2 darts blindfolded intead of 1’. Yet my prior belief is that the effects will be less than additive. This is because for our outcome, treatment response can be saturated. So the synergism would in practice (I imagine) be less than additive. I cannot rule out the potential for additivity/super-additivity however, as this is grounded in mechanistic evidence.

Surprisng that so few firm resources exist for this common study design…

1 Like

Using anything other than linearity in 0,1,2 will fail to exploit the concentration of power into one degree of freedom, and splines don’t work with lots of ties in the data.

Bringing in repeated measurements (longitudinal data) is a different issue that needs to be carefully addressed. We have lots of experience with treatment \times time interaction in a 2-arm parallel group study but not so much with a factorial design.

Thank you @f2harrell for your insights. Of course, now that you mention it, it is obvious that splines defeat the purpose of modelling treatment this way (X = 0,1,2).

I will continue to read on this. I’d like to investigate whether the ‘information borrowing’ Bayesian model you describe is indeed possible. It strikes me as a smart solution - I’m just not certain whether it is possible to treat the coefficient for treatment as a random effect, given it only has four levels, potentially not enough to estimate the variance. Or actually three levels given the model includes an intercept.

You are confusing Bayesian hierarchical models with ordinary parameters and ordinary priors. The 2\times 2 setup uses a non-hierarchical model with a skeptical prior for only the interaction effect.

Indeed I was confused. I’d been reading Andrew Gelman on using random effects for primary treatment comparisons and my brain was mush.

The approach taken by Simon & Freedman in the article you posted above I think may translate well to the longitudinal context. Thanks a lot for pointing me in a helpful direction.

One further question. The Simon article suggests a very skeptical prior on the interaction term (in my context, the A:B:time interaction). They give 5% prior credibility to a clinically meaningful interaction. In my case, I have some reason to expect an interaction, since there is a biological rationale, but little confidence about which direction it may go. So I would like to use a skeptical prior for the interaction term (centered at zero), although wider than their recommendation.

The complication is that assessing the simple effects of one treatment (e.g. treatment A alone) depends both on the prior on the A:time term, as well as the prior on the coefficient for the A:B:time term. So if I were to specify multiple priors (e.g. skeptical, enthusiastic, pessimistic) for each term, I would then have 3*3 total prior combinations. Which isn’t really feasible for presentation.

I’m not really sure how to approach this. My currently bet is using a weakly informative prior on the three-way interaction term (roughly equivalent to the MLE), and using that throughout. Then using a range of priors for the simple effects. Does that seem like a reasonable approach? Striking there’s so little guidance available for these kinds of RCTs. Really appreciate the help.

1 Like

What great questions. I hope that someone has written a paper with guidance about this, even without the 3-way interaction. For the 2-way situation I would try to specify a prior for the treatment effect were there to be no interaction, and a separate prior for the interaction effect. The second would be less skeptical than the first. I wouldn’t present lots of prior combinations. One possibility for the second prior is that the double difference is restricted to be 1/2 or less of the main effect.