I had a question about multiple testing correction when we have both continuous and categorical variables.
My question is given as follow:
In an experiment, we have two outcomes, 15 categorical variables and two continuous ones. In order to evaluate the significance of each variable on the outcome in a univariate setting, chi-square and Wilcoxon tests have been used for categorical and continuous variables, accordingly. With best of my knowledge, since two different methods were used to test different types of variables, it is not allowed to correct p-Values of all tests together.
In order to solve this problem, is it allowed to apply Wilcoxon test for both categorical and continuous variables to be able to correct all of the tests together?
Looking forward receiving your valuable feedback!
You might consider a unified approach that handles binary outcomes, discrete ordinal, and continuous ordinal responses: the proportional odds ordinal logistic regression model. Some resources are here.
But whether you do that or not, standard multiplicity adjustments such as the Bonferroni inequality allow you to use p-values for any combination of tests.
The bigger question is what is the value of univariable tests? They are very hard to interpret when one variable is confounded with another.
Thanks a lot for the clear answer. The pValues of the univariate tests before correction are significant for some variables but not highly significant. In addition to the univariate setting of the analysis, I will apply also a multivariate one to deal better with confounded variables to evaluate the impact of combination of variables on the outcome.
I’m still not clear. The univariable p-values are not informative or actionable, i.e., any action you take on the basis of them is likely to be wrong. So why compute them?
By using machine learning methods in a mutivariate setting, we can predict the outcome. As a result, a combination of variables has a better impact of prognosis prediction than each of the single variables. But still, it is important for us to be sure that our evaluations in a univariate setting is correct.
The point is that univariable associations do not necessarily (or indeed often) reflect the conditional associations when variables are combined in multivariable models. Meaning that certain variables can show no univariable association with the outcome, while when considered together with other variables in a model they are associated with the outcome (but just conditional on the other variables). The inverse and other flavors are also possible of course: association in a univariable setting, but not when accounting for other variables and shades of grey in between these extremes. This can be caused for example by confounding.
What often happens is that researchers evaluate a bunch of univariable associations, generally supported with p-value/hypothesis tests, and then only carry the variables with ‘significant’ univariable associations over into the multivariable analyses. For the reasons mentioned above, this is not recommended as you risk losing out on (informative) variables and/or combining ones that ‘cancel’ each other out. With this in mind, calculating p-values for the univariable associations is not necessarily informative for your ultimate multivariable analysis.
many thanks for your explanations. I will definitely consider your fine notes for my analysis.