Different p values of interaction terms in a logistics regression model obtained by two methods

Hi. I would like to obtain the p-value of an interaction term in a logistic regression model. I have tried two methods to accomplish this.

First, I fitted a logistic model that includes the interaction term and obtained the p-value returned by the model (SELECT_GroupGene12:CDKiY, p-value = 0.994). Here’s the code and summary of the model:

> fit1 <- glm(CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
> summary(fit1)

The output is as follows:

Call:
glm(formula = CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + 
    Histology, family = "binomial", data = tmp_Clin)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1448  -0.9129  -0.7868   1.3027   1.6270  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 0.41881    1.80366   0.232    0.816
SELECT_GroupGene12         18.67230 3761.85360   0.005    0.996
CDKiY                      -0.46924    0.61670  -0.761    0.447
Patient_Dx_Age             -0.01417    0.03301  -0.429    0.668
HistologyILC              -18.03547 6522.63863  -0.003    0.998
HistologyOther            -18.19134 6522.63862  -0.003    0.998
Histologyuk                19.05416 6522.63863   0.003    0.998
SELECT_GroupGene12:CDKiY  -36.34491 4978.39200  -0.007    0.994

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 75.025  on 56  degrees of freedom
Residual deviance: 60.682  on 49  degrees of freedom
AIC: 76.682

Number of Fisher Scoring iterations: 17

Another method I tried involves fitting two logistic regression models, one with the interaction term and one without. I obtained the p-value (0.01513) using the anova function:

> fit1 <- glm(CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
> fit2 <- glm(CDKi_Response3 ~ SELECT_Group + CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
> anova(fit1, fit2, test = 'Chisq')

The output is as follows

Analysis of Deviance Table

Model 1: CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology
Model 2: CDKi_Response3 ~ SELECT_Group + CDKi + Patient_Dx_Age + Histology
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1        49     60.682                       
2        50     66.564 -1  -5.8811   0.0153 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here is the distribution of the outcome variable CDKi_Response3 stratified by SELECT_Group and CDKi. Upon reviewing this distribution, it appears that there is a significant interaction between SELECT_Group and CDKi.

> table(tmp_Clin$CDKi_Response3, tmp_Clin$SELECT_Group, tmp_Clin$CDKi)

The table output is as follows:

CDKi  = N


    Gene1 Gene12
  0    16      0
  1    11      3

CDKi  = Y

   
    Gene1 Gene12
  0    16      4
  1     7      0

I’m wondering why different pvalues were obtained by these two methods and which method should be used.
Thank you.

The p-value for the interaction term in your first approach is from a Wald test (Wald test - Wikipedia). The p-value from your second approach is from a likelihood ratio test (Likelihood-ratio test - Wikipedia). With enough data, these two approaches should give roughly equivalent results. In small samples such as yours, though, the results can differ as they do here.

I believe the likelihood ratio test is generally considered more reliable in small samples. However, your data have the additional issue of sparsity. This can be seen in your table output, where the strata associated with Gene12 have observed probabilities of either 0 or 1. This is going to make it very difficult to get reliable estimates of the model parameters, and indeed you can see this in the output from summary. Most of your parameters have ridiculously huge point estimates and even more ridiculous standard errors. This is an indication that you don’t have enough data to fit this model. So I wouldn’t treat the p-value from either method as being reliable. You should try to get more data, or reconsider the model you want to fit.

Edit to add: to be clear, the main issue here is the sparsity, which makes the point estimates and the standard errors of the coefficients really go off the rails. This has a big impact on the Wald test because the test statistic is based directly off of those quantities.

1 Like

I see that the issue here is quite similar to a previous post of yours: Why so big difference in the p values obtained from logistic regression and Fisher's exact test?
I’d suggest revisiting the answers from that post too.

@f2harrell I see that you recommended the LRT in that case. I’m curious if you think the LRT would be reliable in this case too given the size of the model compared to the amount of actual information in the data.

The LR test should be reliable here. The results are showing that the Hauck-Donner complete separation effect is in play, which ruins standard errors and Wald tests.

LR can be obtained automatically now in the R rms package:

require(rms)
f <- lrm(y ~ x1 + x2 + x3*x4, x=TRUE, y=TRUE)
anova(f, test='LR')

But I think that the dependent variable represents a dichotomization of an underlying continuous response, so all of the results are questionable and makes the Hauck-Donner effect much more likely.

4 Likes