COVID-19 vaccine outcomes and binary data analysis

I just read this article about the results from Pfizer’s vaccine trial in children age 12-15 that says the vaccine was found to be 100% effective as 18 events were observed in the placebo group and 0 events in the vaccine group. 1,129 patients received placebo and 1,131 received vaccine.

Not seeing a confidence interval or p-value in the press release, I did a crude analysis in R with the following code:

n_placebo <- 1129
n_vaccine <- 1131

df <- data.frame(
  treatment = c(rep("placebo", n_placebo), rep("vaccine", n_vaccine)),
  infection = c(rep(1, 18), rep(0, n_placebo - 18), rep(0, n_vaccine))

A logistic regression model says the treatment effect is extremely large but with great uncertainty:

f <- glm(infection ~ treatment, data = df, family = binomial())

glm(formula = infection ~ treatment, family = binomial(), data = df)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-0.17929  -0.17929  -0.00003  -0.00003   2.87705  

                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -4.1226     0.2376  -17.35   <2e-16 ***
treatmentvaccine -17.4434   869.2281   -0.02    0.984    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 209.84  on 2259  degrees of freedom
Residual deviance: 184.71  on 2258  degrees of freedom
AIC: 188.71

Number of Fisher Scoring iterations: 20

Both the chi-square and fisher exact test conclude the treatment is significant (Fisher’s test also provides a confidence interval for the odds ratio)

chisq.test(table(df$treatment, df$infection))

	Pearson's Chi-squared test with Yates' continuity correction

data:  table(df$treatment, df$infection)
X-squared = 16.215, df = 1, p-value = 5.655e-05
fisher.test(table(df$treatment, df$infection))

	Fisher's Exact Test for Count Data

data:  table(df$treatment, df$infection)
p-value = 3.506e-06
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.0000000 0.2242264
sample estimates:
odds ratio 

I’m guessing the logistic regression has the large standard error from perfect separation issues (there are no infections in the vaccine group). What modeling approach would one use to resolve this issue while still using a regression model?

1 Like

If interested in confidence intervals for ORs, use the profile likelihood method. See for example this R package. Profile likelihood is more accurate in general, and is not bothered by complete separation.


stephen senn wrote a brief blog post about it using exact binomial confidence intervals Youthful Enthusiasm

edit: i’ve tried using exact logistic regression in SAS before but unless the sample size is small it won’t run due to high computation


Confidence intervals for proportions and their differences, ratios and odds ratios are all easily calculated by using the Excel spreadsheets freely downloadable at the website for my book

All these intervals have good coverage properties and are appropriately boundary-respecting: they still produce sensible intervals even when there is a zero cell frequency - as here.


When I say ‘freely downloadable’, I mean that anyone can download them free of charge - absolutely no need to buy my book in order to use the spreadsheets! They are self-explanatory to use.