I just read this article about the results from Pfizer’s vaccine trial in children age 12-15 that says the vaccine was found to be 100% effective as 18 events were observed in the placebo group and 0 events in the vaccine group. 1,129 patients received placebo and 1,131 received vaccine.
Not seeing a confidence interval or p-value in the press release, I did a crude analysis in R with the following code:
n_placebo <- 1129 n_vaccine <- 1131 df <- data.frame( treatment = c(rep("placebo", n_placebo), rep("vaccine", n_vaccine)), infection = c(rep(1, 18), rep(0, n_placebo - 18), rep(0, n_vaccine)) )
A logistic regression model says the treatment effect is extremely large but with great uncertainty:
f <- glm(infection ~ treatment, data = df, family = binomial()) summary(f) Call: glm(formula = infection ~ treatment, family = binomial(), data = df) Deviance Residuals: Min 1Q Median 3Q Max -0.17929 -0.17929 -0.00003 -0.00003 2.87705 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.1226 0.2376 -17.35 <2e-16 *** treatmentvaccine -17.4434 869.2281 -0.02 0.984 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 209.84 on 2259 degrees of freedom Residual deviance: 184.71 on 2258 degrees of freedom AIC: 188.71 Number of Fisher Scoring iterations: 20
Both the chi-square and fisher exact test conclude the treatment is significant (Fisher’s test also provides a confidence interval for the odds ratio)
chisq.test(table(df$treatment, df$infection)) Pearson's Chi-squared test with Yates' continuity correction data: table(df$treatment, df$infection) X-squared = 16.215, df = 1, p-value = 5.655e-05
fisher.test(table(df$treatment, df$infection)) Fisher's Exact Test for Count Data data: table(df$treatment, df$infection) p-value = 3.506e-06 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.0000000 0.2242264 sample estimates: odds ratio 0
I’m guessing the logistic regression has the large standard error from perfect separation issues (there are no infections in the vaccine group). What modeling approach would one use to resolve this issue while still using a regression model?