Multiple test correction for several independent DVs?


A researcher I know recently was asked by a reviewer to correct their results for multiple comparisons.

They ran four linear regression models with the same data for the independent variables (the same 4 IVs in each model), but different outcome measures for each model (these were all conceptually distinct DVs). They pre-planned their analysis but did not pre-register.

They are using frequentist null hypothesis significance testing and the results will be significant in either case - the issue is whether it is correct to do multiple test correction in this case. A senior researcher in the area says that it is not at all common in this research community to correct for multiple comparisons unless you think the DVs are strongly correlated with each other.

What should they do in this case? I had never heard of the dependence of the outcome measures mattering for this, but this post seems to possibly indicate that they are correct?

I’d love to learn more about this since I will probably run across this in my own research. Is there any accessible textbook or online resource that covers this problem?

All the best, and thank you for any help you can provide,
Jacob Ritchie

You might find this older post, with a section on multiple comparisons by Dr. Harrell helpful:

Taking for granted that all analyses are to be conducted within the frequentist perspective:

If the comparisons were specified ahead of the data analysis, the consensus is that no adjustments are necessary. This makes sense a if decision based on the experiment is to be used only for the purposes of the experimenter. An interesting dilemma is whether a party not part of the design phase should accept this argument. I’m not sure if there is a consensus on that. Don’t be surprised if you receive resistance to the prespecified hypotheses argument.

For post hoc hypotheses, there are a number of methods. Older work focus on control of the Family Wide Error Rate (FWER); since the 1995 Benjamini and Hochberg paper, newer methods focus on the False Discovery Rate (FDR). There are interesting relationships between the methods, with the FDR controlling the FWER in a weaker sense, but providing more power.

I’m not exactly sure when we can substitute FDR control for FWER control. It seems that the FDR is more appropriate for the exploratory stages of research (ie. GWAS studies looking at thousands of test results), while FWER is more important in the later stages of confirmation.

Some references (in no particular order):
Cook, RJ; Farewell, VT (1996). Multiplicity Considerations in the Design of Clinical Trials Journal of the Royal Statistical Society Vol 159 No. 1 Pages 93-110. (Referenced in multiple posts by Dr. Harrell).

Westfall, P; Young, S. (1993) Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley Series in Probability and Mathematical Statistics

Hochberg, Y; Tamhane, A. (1987) Multiple Comparison Procedures. Wiley Series in Probability and Statistics

Benjamini, Y; Hochberg, Y. (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological) Vol. 57, No 1. (Landmark paper describing False Discovery Rate control and the relation to other methods)