How to determine the p value cutoff for unvariate regression analysis to be included in multivariate analysis

Hi All,

I had a quick question about how to determine the cut off p value in order to include relevant variables from univariate analysis into multivariate regression analysis. For example, in my analysis- I am looking at the different clinical parameters that are associated with increased risk of mortality. So I first performed the univariate regression analysis and then variable whose p-value was <0.30 on univariate analysis, were included in the final multivariate regression analysis to determine which ones really are associated with the mortality. Is this method correct? I remember seeing something like this in some papers but can’t find them now​:man_facepalming: If yes/no, would really appreciate if you guys could suggest something or guide me here? I don’t know how to answer reviewers questions on this :man_facepalming:

Univariable screening is not an appropriate way to build a model. Details are in my RMS course notes Chapter 4 which provides the key literature reference and explains. Not only is univariable screening not reliable, it is “double dipping” resulting in distorted statistical inference.

Speaking in general, model specification is much more important than model selection.


Data driven methods for variable selection are generally frowned upon, and I am of the opinion that inclusion of a covariate based on a p-value alone should not precede variable selection based on prior knowledge about its association/causal relationship with the exposure and/or outcome.

See this essay by VanderWeele for a useful, plain-language discussion of principles for confounder selection.