Sample Size Calculation for a 2x2 Factorial RCT

mksp · August 21, 2025, 10:26pm

Experts!

I would appreciate a sanity check on my sample size calculation for an upcoming study.

Study Design: We’re using a 2x2 factorial design to evaluate two interventions for reducing the hemodynamic stress response during craniotomy: Ultrasound-Guided Nerve Block (Factor A) and a Helper Drug (Factor B).

The four study arms will be:

Arm A: Standard Nerve Block + Saline (Placebo)
Arm B: Standard Nerve Block + Helper Drug
Arm C: Ultrasound-Guided Nerve Block + Saline (Placebo)
Arm D: Ultrasound-Guided Nerve Block + Helper Drug

Our primary outcome is the change in Blood Pressure from baseline to the peak response (1 min post-pinning), and our analysis plan is a 2x2 ANCOVA with baseline blood pressure as a covariate.

We are powering the study to detect a main effect for each intervention. For our example below, we will use a small-to-medium effect size, corresponding to a Cohen’s d of 0.3.

> library(pwrss)
> 
> cohend = 0.3 ## example Cohen's d from literature
> effect_size_f <- cohend/2         # Cohen's f, for simplicity assume f = d/2
> f2_value <- effect_size_f^2       # f-squared
> # Convert f-squared to eta-squared using: eta2 = f2/(1+f2)
> eta2_value <- f2_value / (1 + f2_value)
> 
> # ANCOVA design parameters
> n_way <- 2
> n_levels <- c(2,2)          
> n_covariates <- 1           # Baseline Measurment
> 
> alpha <- 0.05
> desired_power <- 0.80
> 
> result <- pwrss.f.ancova(f2 =f2_value ,n.way = n_way,
+                          n.levels = n_levels,
+                          n.covariates = n_covariates,
+                          alpha = alpha,
+                          power = desired_power,
+                          verbose = TRUE)
 Two-way Analysis of Covariance (ANCOVA) 
  H0: 'eta2' or 'f2' = 0 
  HA: 'eta2' or 'f2' > 0 
 --------------------------------------
 Factor A: 2 levels 
 Factor B: 2 levels 
 --------------------------------------
 effect power n.total   ncp df1     df2
      A   0.8     351 7.893   1 345.786
      B   0.8     351 7.893   1 345.786
  A x B   0.8     351 7.893   1 345.786
 --------------------------------------
 Type I error rate: 0.05

# The output suggests a total sample size of 352 participants (88 per arm).
# Adjusting for Dropout e.g. 10% ==> N/0.9

@f2harrell My question is: Does this overall approach seem sound? Are there any common pitfalls with this method that I might be overlooking?

Thanks in advance.

stephenrho · August 22, 2025, 2:00pm

I’m not overly familiar with the pwrss package but looking at the examples for pwrss.f.ancova it looks like this function powers for interactions in two+ way designs. Superpower::ANCOVA_anaytic might be closer to what you need.

davidcnorrismd · August 23, 2025, 12:42am

This is why I would never enroll in a study.

mksp · August 23, 2025, 1:44am

Thanks a lot @stephenrho I checked the package documentation, I think we will face the dilemma of guessing the r-squared if we will use the ANCOVA_analytic() as you see in the function argument

r2	Coefficient of Determination of the model with only the covariates

Do you have any suggestions or rule of thumb to validate my calculation ?

mksp · August 29, 2025, 8:59pm

Following @stephenrho I used Superpower::ANCOVA_analytic to calculate the sample size. I wrote the following heavily commented script to act as a clear example and step by step guide for future queries. However, @f2harrell and @stephenrho, I need your feedback to make sure the script and comments are accurate. I also appreciate the valuable input from datamethods community.

> # ---
> # R SCRIPT FOR SAMPLE SIZE CALCULATION
> # Study: 2x2 Factorial ANCOVA for US / Helper Drug Trial
> # Date: August 30, 2025
> # ---
> 
> # ---
> # SECTION 1: SETUP
> # ---
> # Load the necessary library for the power analysis.
> library(Superpower)
> ## Library Documentation Ref:
> ## https://cran.r-project.org/web/packages/Superpower/vignettes/intro_to_superpower.html
> 
> # ---
> # SECTION 2: STUDY PARAMETERS & RATIONALE
> # ---
> # This section defines all the key assumptions for the power analysis. Each
> # assumption is explained to provide a clear and defensible rationale.
> 
> # -- 2a. Clinical Rationale: Defining the Minimally Clinically Important Difference (MCID)
> # In line with the DELTA2 guidance for specifying a target difference, we have
> # defined our MCIDs based on a review of the existing evidence base,
> # supplemented by clinical judgment to ensure the targets are both realistic and
> # important. Our primary goal is to power the study to detect the smallest
> # effects that would be clinically meaningful.
> 
> # - Proposed MCID for Ultrasound Guided Nerve Block: A mean arterial pressure (MAP)
> # reduction of 10 mmHg. Rationale: A recent systematic review and meta-analysis
> # by Luo et al. (2023) found that a scalp nerve block (SNB) significantly
> # reduced MAP by a mean difference of -14.00 mmHg (95% CI: -19.71 to -8.28)
> # during skull pin insertion compared to non-SNB controls. Our chosen MCID of 10
> # mmHg is a conservative value that falls well within this evidence range and
> # represents a clinically significant reduction of the hemodynamic response.
> 
> # - Proposed MCID for Helper Drug: A MAP reduction of 5 mmHg.
> # Rationale: A 5 mmHg reduction represents the smallest effect size our clinical
> # team considers important enough to justify the adoption of this adjunctive
> # therapy. Powering for this smaller effect ensures the study can
> # detect a meaningful benefit even if it is not as large as the primary block.
> 
> # -- 2b. Hypothesized Means based on the MCID
> # These means create the additive model we want to test. (No interaction)
> # Group A: Control condition (Standard Nerve Block + Placebo)
> mean_A <- 85.0
> # Group B: Helper Drug effect only (-5 mmHg from control)
> mean_B <- 80.0
> # Group C: Ultrasound effect only (-10 mmHg from control)
> mean_C <- 75.0
> # Group D: Both effects combined, assuming additive effect (-15 mmHg from control)
> mean_D <- 70.0
> 
> # These means are constructed to reflect a purely additive model. The effect of
> # Ultrasound is assumed to be -10 mmHg both in the absence (85 -> 75) and
> # presence (80 -> 70) of the Helper Drug. Likewise, the Helper Drug effect is
> # -5 mmHg regardless of Ultrasound.
> # Therefore, the hypothesized interaction effect is exactly zero.
> 
> # -- 2c. Defining Residual Standard Deviation
> # Based on clinical experience, we estimate the SD of blood
> # pressure in this population to be around 15.0 mmHg.
> sd_for_power <- 15.0
> 
> # This choice of SD is supported by a review of relevant literature. For
> # example, Arshad et al. (2013) reported standard deviations for MAP in the
> # range of 10-13 mmHg. An SD of 15.0 mmHg is therefore a reasonable and slightly
> # conservative estimate for this population, accounting for potential
> # variability.
> 
> # -- 2d. Defining Covariate Strength (R-squared)
> # This is the expected proportion of variance in the outcome that will be
> # explained by our covariates (baseline blood pressure + age). 
> # We choose a moderate value of 0.1.
> r2_from_covs <- 0.1
> 
> # -- 2e. Defining Alpha and Power
> alpha_level <- 0.05 # Standard Type I error rate
> target_power <- 0.80  # Standard target power (corresponds to beta_level = 0.20)
> beta_level <- 1 - target_power
> 
> # ---
> # SECTION 3: SPECIFYING THE 2x2 FACTORIAL DESIGN
> # ---
> # This section explains how the study design is translated into the format
> # required by the Superpower package.
> 
> # -- 3a. The Design String Convention: "2b*2b"
> # The string is a shorthand for the experimental design:
> # - The NUMBER ('2') is the number of levels in a factor.
> # - The LETTER ('b') indicates a 'b'etween-subjects factor (unpaired groups).
> # - The ASTERISK ('*') separates the factors.
> # So, "2b*2b" means: "Two factors, the first is between-subjects with 2 levels,
> # and the second is also between-subjects with 2 levels."
> 
> # -- 3b. Assigning Factors: Which 'b' is Which?
> # WE, the researchers, decide the order. The output labels 'a' and 'b' will
> # correspond to this order. We must be consistent.
> factor_a_name <- "No_Ultrasound_vs_US" # This will be the FIRST 'b'
> factor_b_name <- "Saline_vs_HelperDrug"  # This will be the SECOND 'b'
> 
> # -- 3c. The `mu` Vector Ordering Rule
> # This is the most critical step. As stated in the package documentation,
> # Superpower follows the convention where the levels of the LAST
> # factor change the fastest.
> #
> # The documentation provides a 3-factor example (a*b*c) where the levels
> # of 'c' cycle fastest, then 'b', then 'a'. For our 2-factor design (a*b),
> # this means the levels of 'b' must change fastest.
> #
> # Let's define the levels for our factors:
> # Factor 'a' (Ultrasound): a1 = No Ultrasound, a2 = Ultrasound
> # Factor 'b' (Helper Drug): b1 = Saline,  b2 = Helper Drug
> #
> # The correct order is therefore: a1b1, a1b2, a2b1, a2b2
> # Notice how the 'b' levels (b1, b2) cycle before 'a' changes from a1 to a2.
> hypothesized_means <- c(
+   mean_A, # Corresponds to a1, b1: No US, Saline
+   mean_B, # Corresponds to a1, b2: No US, Helper Drug
+   mean_C, # Corresponds to a2, b1: US, Saline
+   mean_D  # Corresponds to a2, b2: US, Helper Drug
+ )
> 
> # ---
> # SECTION 4: POWER CALCULATION
> # ---
> # All parameters have been defined and explained. We can now run the analysis.
> power_analysis_results <- Superpower::ANCOVA_analytic(
+   design      = "2b*2b",
+   mu          = hypothesized_means,
+   sd          = sd_for_power,
+   n_cov       = 2, # age + baseline blood pressure
+   r2          = r2_from_covs,
+   alpha_level = alpha_level,
+   beta_level  = beta_level
+ )
Warning message:
In qf(1 - alpha_level, num_df, den_df) : NaNs produced
> 
> # the Warning message is expected due to equal spacing in means (no interaction):
> # In qf(1 - alpha_level, num_df, den_df) : NaNs produced
> 
> # ---
> # SECTION 5: RESULTS & INTERPRETATION
> # ---
> 
> # -- 5a. Printing the Final Results
> print(power_analysis_results)
Power Analysis Results for ANCOVA
    Total N Covariates  r2 Alpha Level Beta Level Power
a        68          2 0.1        0.05     0.1988 80.12
b       260          2 0.1        0.05     0.1984 80.16
a:b       8          2 0.1        0.05     0.9500  5.00

> 
> # -- 5b. Final Interpretation
> # The output shows the total sample size required to detect each main effect
> # with approximately 80% power.
> #
> # - Main Effect of Ultrasound ('a', 10 mmHg): Requires N = 68
> # - Main Effect of Helper Drug ('b', 5 mmHg): Requires N = 260
> #
> # To ensure our study is adequately powered for both main effects, we must be
> # able to detect the smaller, more subtle effect (the 5 mmHg MCID of the Helper
> # Drug). Therefore, we must choose the larger of the required sample sizes.
> #
> # FINAL CONCLUSION: The required total sample size for this study is N = 260
> # participants, which corresponds to n = 65 participants per group. Per the
> # Principal Investigator, no additional adjustments for dropout are required.

Louis_Martin · September 2, 2025, 5:34pm

I don’t know much about analytic sample size estimators, but I wanted to see if I could get the same results using simulation. Below, I simulated values of baseline BP, final BP, and age that on average gave approximately the same group means, covariate adjusted R^2, and outcome SD. Then, I assessed the proportion of simulations in which p was <.05 for each main effect:

library(dplyr)
N <- 65 # Number of participants per group

# Set up 2x2 structure in dataframe
d <- data.frame(
  nerve_block = factor(
    rep(c('Standard', 'Ultrasound-Guided'), each = N*2)),
  drug = factor(rep(c('Yes', 'No', 'Yes', 'No'), each = N))
)

runs <- 10000 # Number of simulations to perform
# Store results
ps.drug <- ps.nerve <- r2 <- mean.a <- mean.b <- mean.c <- mean.d <- 
  sd.a <- sd.b <- sd.c <- sd.d <- vector('double', runs)
d$y <- NA_real_ # Outcome (blood pressure)

# Run simulations
for (i in 1:runs) {
  d$age <- runif(nrow(d), 50, 80)      # Random age
  d$baseline <- rnorm(N*4, 85, 15)     # Baseline BP
  d$y <- sapply(1:nrow(d), \(x) {      # Outcome BP (y) 
    rnorm(  
      1,  # For each participant...
      # Mean is average of 3 random values and baseline....
      mean(c(rnorm(3, 85, 19), rep(d$baseline[[x]], 1))) + 
        # plus increase in BP across age...
        ((d$age[[x]] - mean(d$age)) * .38) -    
        # Ultrasound guidance reduces BP by 10
        if_else(d$nerve_block[[x]] == 'Ultrasound-Guided', 10, 0) -
        # Drug reduces mean BP by 5
        if_else(d$drug[[x]] == 'Yes', 5, 0),
      11.7) 
    })
  
  # Get means per condition
  mean.a[[i]] <- mean(d[d$nerve_block == 'Standard' & d$drug == 'No', 'y'])
  mean.b[[i]] <- mean(d[d$nerve_block == 'Standard' & d$drug == 'Yes', 'y'])
  mean.c[[i]] <- mean(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'No', 'y'])
  mean.d[[i]] <- mean(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'Yes', 'y'])
  
  # Get sds per condition
  sd.a[[i]] <- sd(d[d$nerve_block == 'Standard' & d$drug == 'No', 'y'])
  sd.b[[i]] <- sd(d[d$nerve_block == 'Standard' & d$drug == 'Yes', 'y'])
  sd.c[[i]] <- sd(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'No', 'y'])
  sd.d[[i]] <- sd(d[d$nerve_block == 'Ultrasound-Guided' & d$drug == 'Yes', 'y'])
  
  # R-squared from covariate-only model
  s <- summary(lm(data = d, y ~ baseline + age))
  r2[[i]] <- s$r.squared
  
  # p values for main effect in adjusted model
  a <- anova(lm(data = d, y ~ baseline + age + drug * nerve_block))
  ps.drug[[i]] <- a['drug', 'Pr(>F)']
  ps.nerve[[i]] <- a['nerve_block', 'Pr(>F)']
}

# Results from 10,000 simulations with N = 65/group
mean(ps.drug < .05)  # Power for main effect of drug:              0.7972 
mean(ps.nerve < .05) # Power for main effect of nerve block:       1
mean(r2)             # Adj R2 with only baseline BP and age:       0.1016719
mean(mean.a)         # No US, Saline:     84.97058
mean(mean.b)         # No US,   Drug:     79.98512                     
mean(mean.c)         # US,    Saline:     74.98818
mean(mean.d)         # US,      Drug:     69.96514
mean(c(sd.a, sd.b, sd.c, sd.d)) # Mean SD within each subgroup: 15.08572

I got the same results as you, with 65 participants/group yielding a power of ~80%. Hope that helps!

mksp · September 3, 2025, 7:25am

@Louis_Martin Thanks a lot Sir, this helps a lot. I really appreciate your time and efforts.