Multiplicity in clinical trials with hierarchical strategy

In a clinical trial, when the study design includes a hierarchical analysis strategy, is the statistical adjustment for multiplicity no longer necessary? I want to be able to make conclusions (at the individual hypothesis level) of a secondary outcome (in the scenario in which the primary outcome was statistically significant). I read that in this cases is not necessary to do the statistical adjustment, but I want know the statistical basis that supports it.


Welcome to the site Paula. I hope that several others will respond to this important question. It’s good to step back for a moment and recognize the different camps:

  • Bayesians compute posterior probabilities of effects (e.g., of an unknown parameter being in a certain interval) and don’t use after-the-fact corrections for context, but rather (1) encode skepticism into prior distributions and/or (2) more completely model the context, e.g., when analyzing multiple subgroups for one outcome variable they often develop a hierarchical model that borrows information across subgroups thereby shrinking individual subgroup effects towards the overall average effect.
  • The traditional frequentist approach, which is probably the context of your question, can be practiced in a very conservative way where a penality is applied to all analyses no matter of where the question is in the hierarchy of questions.
  • A more relaxed, context-dependent frequentist approach as exemplified by Cook and Farewell, one of my favorite papers.

If doing a frequentist analysis, the type of analysis must in need of a multiplicity adjustment is a “there exists” analysis, i.e., there exists an endpoint or a subgroup for which the treatment has benefit, where there is no pre-specification of the endpoint or subgroup. On the other hand, when you have strongly pre-specified a hierarchy and you are not doing a “there exists” analysis, Cook and Farewell state that keeping each hypothesis in context is adequate and no multiplicity adjustment is needed.

Cook and Farewell’s approach involves maintaining context at all times. The order of reporting results for different endpoints is pre-specified, and all results are reported, even in journal article abstracts, in that order. That way the reader will be sure that you are not cherry-picking positive results. Unlike an \alpha-spending approach (that I never liked), you can make a statement about an endpoint whether or not previous endpoints in the hierarchy yielded positive evidence for efficacy. Cook and Farewell state this as being interested in marginal questions, e.g., we want to know whether a treatment reduced pain whether or not it reduces the risk of stroke.

That being said I think it is often better to develop a unified hierarchical ordinal outcome for the primary efficacy analysis.


It will be hard to improve on what Frank wrote above. I’ll just post a link to the short thread that occurred here a few months ago were I posted some references to this issue:

If neither 1. ordering the research questions or 2. doing an alpha spending adjustment are acceptable, it might be worth considering the FDR procedure. The following is open access:

Watson, J.; Robertson, D. (2020) Controlling type I error rates in multi‐arm clinical trials: A case for the false discovery rate. Pharmaceutical Statistics (link)

Bradley Efron has done much work in giving the FDR an Empirical Bayes interpretation.