I want to use 20 significant variables to construct a binary logistic regression model. However,I found that there is a an existence of sever collinearity among independent variables, I have used stepwise forward method to do the regression, but the confidence interval of OR is extrme wide (12.227192-5538.296580). I don’t do well in statistical analysis, if anyone could tell me if the conculsion is right? or how can I do with my data, I would appreciate, thank you so much.

Stepwise regression is highly discouraged. I go into detail about collinearity in my RMS course notes. Short answer: don’t let competing variables compete. Use things like variable clustering and principal component analysis. But not anything related to p-values.


Or exert yourself to think through the data generating process and form a structural causal model of the problem (a DAG) and take an informed position, and look for opportunities to reduce your uncertainties. This is science. The machine cannot divine the causal process for you; but the machine and the data will mislead you.