Sample Size Calculation for a 2x2 Factorial RCT

Experts!

I would appreciate a sanity check on my sample size calculation for an upcoming study.

Study Design: We’re using a 2x2 factorial design to evaluate two interventions for reducing the hemodynamic stress response during craniotomy: Ultrasound-Guided Nerve Block (Factor A) and a Helper Drug (Factor B).

The four study arms will be:

  • Arm A: Standard Nerve Block + Saline (Placebo)
  • Arm B: Standard Nerve Block + Helper Drug
  • Arm C: Ultrasound-Guided Nerve Block + Saline (Placebo)
  • Arm D: Ultrasound-Guided Nerve Block + Helper Drug

Our primary outcome is the change in Blood Pressure from baseline to the peak response (1 min post-pinning), and our analysis plan is a 2x2 ANCOVA with baseline blood pressure as a covariate.

We are powering the study to detect a main effect for each intervention. For our example below, we will use a small-to-medium effect size, corresponding to a Cohen’s d of 0.3.

> library(pwrss)
> 
> cohend = 0.3 ## example Cohen's d from literature
> effect_size_f <- cohend/2         # Cohen's f, for simplicity assume f = d/2
> f2_value <- effect_size_f^2       # f-squared
> # Convert f-squared to eta-squared using: eta2 = f2/(1+f2)
> eta2_value <- f2_value / (1 + f2_value)
> 
> # ANCOVA design parameters
> n_way <- 2
> n_levels <- c(2,2)          
> n_covariates <- 1           # Baseline Measurment
> 
> alpha <- 0.05
> desired_power <- 0.80
> 
> result <- pwrss.f.ancova(f2 =f2_value ,n.way = n_way,
+                          n.levels = n_levels,
+                          n.covariates = n_covariates,
+                          alpha = alpha,
+                          power = desired_power,
+                          verbose = TRUE)
 Two-way Analysis of Covariance (ANCOVA) 
  H0: 'eta2' or 'f2' = 0 
  HA: 'eta2' or 'f2' > 0 
 --------------------------------------
 Factor A: 2 levels 
 Factor B: 2 levels 
 --------------------------------------
 effect power n.total   ncp df1     df2
      A   0.8     351 7.893   1 345.786
      B   0.8     351 7.893   1 345.786
  A x B   0.8     351 7.893   1 345.786
 --------------------------------------
 Type I error rate: 0.05

# The output suggests a total sample size of 352 participants (88 per arm).
# Adjusting for Dropout e.g. 10% ==> N/0.9

@f2harrell My question is: Does this overall approach seem sound? Are there any common pitfalls with this method that I might be overlooking?

Thanks in advance.

I’m not overly familiar with the pwrss package but looking at the examples for pwrss.f.ancova it looks like this function powers for interactions in two+ way designs. Superpower::ANCOVA_anaytic might be closer to what you need.

2 Likes

This is why I would never enroll in a study.

Thanks a lot @stephenrho I checked the package documentation, I think we will face the dilemma of guessing the r-squared if we will use the ANCOVA_analytic() as you see in the function argument

r2	Coefficient of Determination of the model with only the covariates

Do you have any suggestions or rule of thumb to validate my calculation ?

1 Like

Following @stephenrho I used Superpower::ANCOVA_analytic to calculate the sample size. I wrote the following heavily commented script to act as a clear example and step by step guide for future queries. However, @f2harrell and @stephenrho, I need your feedback to make sure the script and comments are accurate. I also appreciate the valuable input from datamethods community.

> # ---
> # R SCRIPT FOR SAMPLE SIZE CALCULATION
> # Study: 2x2 Factorial ANCOVA for US / Helper Drug Trial
> # Date: August 30, 2025
> # ---
> 
> # ---
> # SECTION 1: SETUP
> # ---
> # Load the necessary library for the power analysis.
> library(Superpower)
> ## Library Documentation Ref:
> ## https://cran.r-project.org/web/packages/Superpower/vignettes/intro_to_superpower.html
> 
> # ---
> # SECTION 2: STUDY PARAMETERS & RATIONALE
> # ---
> # This section defines all the key assumptions for the power analysis. Each
> # assumption is explained to provide a clear and defensible rationale.
> 
> # -- 2a. Clinical Rationale: Defining the Minimally Clinically Important Difference (MCID)
> # In line with the DELTA2 guidance for specifying a target difference, we have
> # defined our MCIDs based on a review of the existing evidence base,
> # supplemented by clinical judgment to ensure the targets are both realistic and
> # important. Our primary goal is to power the study to detect the smallest
> # effects that would be clinically meaningful.
> 
> # - Proposed MCID for Ultrasound Guided Nerve Block: A mean arterial pressure (MAP)
> # reduction of 10 mmHg. Rationale: A recent systematic review and meta-analysis
> # by Luo et al. (2023) found that a scalp nerve block (SNB) significantly
> # reduced MAP by a mean difference of -14.00 mmHg (95% CI: -19.71 to -8.28)
> # during skull pin insertion compared to non-SNB controls. Our chosen MCID of 10
> # mmHg is a conservative value that falls well within this evidence range and
> # represents a clinically significant reduction of the hemodynamic response.
> 
> # - Proposed MCID for Helper Drug: A MAP reduction of 5 mmHg.
> # Rationale: A 5 mmHg reduction represents the smallest effect size our clinical
> # team considers important enough to justify the adoption of this adjunctive
> # therapy. Powering for this smaller effect ensures the study can
> # detect a meaningful benefit even if it is not as large as the primary block.
> 
> # -- 2b. Hypothesized Means based on the MCID
> # These means create the additive model we want to test. (No interaction)
> # Group A: Control condition (Standard Nerve Block + Placebo)
> mean_A <- 85.0
> # Group B: Helper Drug effect only (-5 mmHg from control)
> mean_B <- 80.0
> # Group C: Ultrasound effect only (-10 mmHg from control)
> mean_C <- 75.0
> # Group D: Both effects combined, assuming additive effect (-15 mmHg from control)
> mean_D <- 70.0
> 
> # These means are constructed to reflect a purely additive model. The effect of
> # Ultrasound is assumed to be -10 mmHg both in the absence (85 -> 75) and
> # presence (80 -> 70) of the Helper Drug. Likewise, the Helper Drug effect is
> # -5 mmHg regardless of Ultrasound.
> # Therefore, the hypothesized interaction effect is exactly zero.
> 
> # -- 2c. Defining Residual Standard Deviation
> # Based on clinical experience, we estimate the SD of blood
> # pressure in this population to be around 15.0 mmHg.
> sd_for_power <- 15.0
> 
> # This choice of SD is supported by a review of relevant literature. For
> # example, Arshad et al. (2013) reported standard deviations for MAP in the
> # range of 10-13 mmHg. An SD of 15.0 mmHg is therefore a reasonable and slightly
> # conservative estimate for this population, accounting for potential
> # variability.
> 
> # -- 2d. Defining Covariate Strength (R-squared)
> # This is the expected proportion of variance in the outcome that will be
> # explained by our covariates (baseline blood pressure + age). 
> # We choose a moderate value of 0.1.
> r2_from_covs <- 0.1
> 
> # -- 2e. Defining Alpha and Power
> alpha_level <- 0.05 # Standard Type I error rate
> target_power <- 0.80  # Standard target power (corresponds to beta_level = 0.20)
> beta_level <- 1 - target_power
> 
> # ---
> # SECTION 3: SPECIFYING THE 2x2 FACTORIAL DESIGN
> # ---
> # This section explains how the study design is translated into the format
> # required by the Superpower package.
> 
> # -- 3a. The Design String Convention: "2b*2b"
> # The string is a shorthand for the experimental design:
> # - The NUMBER ('2') is the number of levels in a factor.
> # - The LETTER ('b') indicates a 'b'etween-subjects factor (unpaired groups).
> # - The ASTERISK ('*') separates the factors.
> # So, "2b*2b" means: "Two factors, the first is between-subjects with 2 levels,
> # and the second is also between-subjects with 2 levels."
> 
> # -- 3b. Assigning Factors: Which 'b' is Which?
> # WE, the researchers, decide the order. The output labels 'a' and 'b' will
> # correspond to this order. We must be consistent.
> factor_a_name <- "No_Ultrasound_vs_US" # This will be the FIRST 'b'
> factor_b_name <- "Saline_vs_HelperDrug"  # This will be the SECOND 'b'
> 
> # -- 3c. The `mu` Vector Ordering Rule
> # This is the most critical step. As stated in the package documentation,
> # Superpower follows the convention where the levels of the LAST
> # factor change the fastest.
> #
> # The documentation provides a 3-factor example (a*b*c) where the levels
> # of 'c' cycle fastest, then 'b', then 'a'. For our 2-factor design (a*b),
> # this means the levels of 'b' must change fastest.
> #
> # Let's define the levels for our factors:
> # Factor 'a' (Ultrasound): a1 = No Ultrasound, a2 = Ultrasound
> # Factor 'b' (Helper Drug): b1 = Saline,  b2 = Helper Drug
> #
> # The correct order is therefore: a1b1, a1b2, a2b1, a2b2
> # Notice how the 'b' levels (b1, b2) cycle before 'a' changes from a1 to a2.
> hypothesized_means <- c(
+   mean_A, # Corresponds to a1, b1: No US, Saline
+   mean_B, # Corresponds to a1, b2: No US, Helper Drug
+   mean_C, # Corresponds to a2, b1: US, Saline
+   mean_D  # Corresponds to a2, b2: US, Helper Drug
+ )
> 
> # ---
> # SECTION 4: POWER CALCULATION
> # ---
> # All parameters have been defined and explained. We can now run the analysis.
> power_analysis_results <- Superpower::ANCOVA_analytic(
+   design      = "2b*2b",
+   mu          = hypothesized_means,
+   sd          = sd_for_power,
+   n_cov       = 2, # age + baseline blood pressure
+   r2          = r2_from_covs,
+   alpha_level = alpha_level,
+   beta_level  = beta_level
+ )
Warning message:
In qf(1 - alpha_level, num_df, den_df) : NaNs produced
> 
> # the Warning message is expected due to equal spacing in means (no interaction):
> # In qf(1 - alpha_level, num_df, den_df) : NaNs produced
> 
> # ---
> # SECTION 5: RESULTS & INTERPRETATION
> # ---
> 
> # -- 5a. Printing the Final Results
> print(power_analysis_results)
Power Analysis Results for ANCOVA
    Total N Covariates  r2 Alpha Level Beta Level Power
a        68          2 0.1        0.05     0.1988 80.12
b       260          2 0.1        0.05     0.1984 80.16
a:b       8          2 0.1        0.05     0.9500  5.00

> 
> # -- 5b. Final Interpretation
> # The output shows the total sample size required to detect each main effect
> # with approximately 80% power.
> #
> # - Main Effect of Ultrasound ('a', 10 mmHg): Requires N = 68
> # - Main Effect of Helper Drug ('b', 5 mmHg): Requires N = 260
> #
> # To ensure our study is adequately powered for both main effects, we must be
> # able to detect the smaller, more subtle effect (the 5 mmHg MCID of the Helper
> # Drug). Therefore, we must choose the larger of the required sample sizes.
> #
> # FINAL CONCLUSION: The required total sample size for this study is N = 260
> # participants, which corresponds to n = 65 participants per group. Per the
> # Principal Investigator, no additional adjustments for dropout are required.

I don’t know much about analytic sample size estimators, but I wanted to see if I could get the same results using simulation. Below, I simulated values of baseline BP, final BP, and age that on average gave approximately the same group means, covariate adjusted R^2, and outcome SD. Then, I assessed the proportion of simulations in which p was <.05 for each main effect:

library(dplyr)
N <- 65 # Number of participants per group

# Set up 2x2 structure in dataframe
d <- data.frame(
  nerve_block = factor(
    rep(c('Standard', 'Ultrasound-Guided'), each = N*2)),
  drug = factor(rep(c('Yes', 'No', 'Yes', 'No'), each = N))
)

runs <- 10000 # Number of simulations to perform
# Store results
ps.drug <- ps.nerve <- r2 <- mean.a <- mean.b <- mean.c <- mean.d <- 
  sd.a <- sd.b <- sd.c <- sd.d <- vector('double', runs)
d$y <- NA_real_ # Outcome (blood pressure)

# Run simulations
for (i in 1:runs) {
  d$age <- runif(nrow(d), 50, 80)      # Random age
  d$baseline <- rnorm(N*4, 85, 15)     # Baseline BP
  d$y <- sapply(1:nrow(d), \(x) {      # Outcome BP (y) 
    rnorm(  
      1,  # For each participant...
      # Mean is average of 3 random values and baseline....
      mean(c(rnorm(3, 85, 19), rep(d$baseline[[x]], 1))) + 
        # plus increase in BP across age...
        ((d$age[[x]] - mean(d$age)) * .38) -    
        # Ultrasound guidance reduces BP by 10
        if_else(d$nerve_block[[x]] == 'Ultrasound-Guided', 10, 0) -
        # Drug reduces mean BP by 5
        if_else(d$drug[[x]] == 'Yes', 5, 0),
      11.7) 
    })
  
  # Get means per condition
  mean.a[[i]] <- mean(d[d$nerve_block == 'Standard' & d$drug == 'No', 'y'])
  mean.b[[i]] <- mean(d[d$nerve_block == 'Standard' & d$drug == 'Yes', 'y'])
  mean.c[[i]] <- mean(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'No', 'y'])
  mean.d[[i]] <- mean(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'Yes', 'y'])
  
  # Get sds per condition
  sd.a[[i]] <- sd(d[d$nerve_block == 'Standard' & d$drug == 'No', 'y'])
  sd.b[[i]] <- sd(d[d$nerve_block == 'Standard' & d$drug == 'Yes', 'y'])
  sd.c[[i]] <- sd(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'No', 'y'])
  sd.d[[i]] <- sd(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'Yes', 'y'])
  
  # R-squared from covariate-only model
  s <- summary(lm(data = d, y ~ baseline + age))
  r2[[i]] <- s$r.squared
  
  # p values for main effect in adjusted model
  a <- anova(lm(data = d, y ~ baseline + age + drug * nerve_block))
  ps.drug[[i]] <- a['drug', 'Pr(>F)']
  ps.nerve[[i]] <- a['nerve_block', 'Pr(>F)']
}

# Results from 10,000 simulations with N = 65/group
mean(ps.drug < .05)  # Power for main effect of drug:              0.7972 
mean(ps.nerve < .05) # Power for main effect of nerve block:       1
mean(r2)             # Adj R2 with only baseline BP and age:       0.1016719
mean(mean.a)         # No US, Saline:     84.97058
mean(mean.b)         # No US,   Drug:     79.98512                     
mean(mean.c)         # US,    Saline:     74.98818
mean(mean.d)         # US,      Drug:     69.96514
mean(c(sd.a, sd.b, sd.c, sd.d)) # Mean SD within each subgroup: 15.08572

I got the same results as you, with 65 participants/group yielding a power of ~80%. Hope that helps!

2 Likes

@Louis_Martin Thanks a lot Sir, this helps a lot. I really appreciate your time and efforts. :heart_eyes: