Hi. I would like to obtain the p-value of an interaction term in a logistic regression model. I have tried two methods to accomplish this.
First, I fitted a logistic model that includes the interaction term and obtained the p-value returned by the model (SELECT_GroupGene12:CDKiY, p-value = 0.994). Here’s the code and summary of the model:
> fit1 <- glm(CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
> summary(fit1)
The output is as follows:
Call:
glm(formula = CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age +
Histology, family = "binomial", data = tmp_Clin)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1448 -0.9129 -0.7868 1.3027 1.6270
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.41881 1.80366 0.232 0.816
SELECT_GroupGene12 18.67230 3761.85360 0.005 0.996
CDKiY -0.46924 0.61670 -0.761 0.447
Patient_Dx_Age -0.01417 0.03301 -0.429 0.668
HistologyILC -18.03547 6522.63863 -0.003 0.998
HistologyOther -18.19134 6522.63862 -0.003 0.998
Histologyuk 19.05416 6522.63863 0.003 0.998
SELECT_GroupGene12:CDKiY -36.34491 4978.39200 -0.007 0.994
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.025 on 56 degrees of freedom
Residual deviance: 60.682 on 49 degrees of freedom
AIC: 76.682
Number of Fisher Scoring iterations: 17
Another method I tried involves fitting two logistic regression models, one with the interaction term and one without. I obtained the p-value (0.01513) using the anova function:
> fit1 <- glm(CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
> fit2 <- glm(CDKi_Response3 ~ SELECT_Group + CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
> anova(fit1, fit2, test = 'Chisq')
The output is as follows
Analysis of Deviance Table
Model 1: CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology
Model 2: CDKi_Response3 ~ SELECT_Group + CDKi + Patient_Dx_Age + Histology
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 49 60.682
2 50 66.564 -1 -5.8811 0.0153 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here is the distribution of the outcome variable CDKi_Response3 stratified by SELECT_Group and CDKi. Upon reviewing this distribution, it appears that there is a significant interaction between SELECT_Group and CDKi.
> table(tmp_Clin$CDKi_Response3, tmp_Clin$SELECT_Group, tmp_Clin$CDKi)
The table output is as follows:
CDKi = N
Gene1 Gene12
0 16 0
1 11 3
CDKi = Y
Gene1 Gene12
0 16 4
1 7 0
I’m wondering why different pvalues were obtained by these two methods and which method should be used.
Thank you.