These are the calculations if this helps to explain my concerns. The output of the estimated minimal required sample size is shown at the bottom for each of the four steps. We see that for step 3 and 4, the estimated minimal sample size is almost an order of magnitude smaller of the proportion experiencing any of the three events are used compared to largest estimate produced by looking at any single outcome:
Step 1: What sample size will produce a precise estimate of the overall outcome risk or mean outcome value?
Assumed proportion experiencing an event
p_any ← 0.07
p_outp ← 0.06
p_hosp ← 0.03
p_icu ← 0.002
Targeted margin of error for estimating the overall risk of the outcome
target_margin_error_any ← 0.02
target_margin_error_outp ← 0.02
target_margin_error_hosp ← 0.01 # lower target as the proportion is low
target_margin_error_icu ← 0.005 # lower target as the proportion is very low
For any outcome
n1_any ← (1.96/target_margin_error_any)^2 * p_any * (1 - p_any)
For outpatient visit or worse
n1_outp ← (1.96/target_margin_error_outp)^2 * p_outp * (1 - p_outp)
For hospital admission or worse
n1_hosp ← (1.96/target_margin_error_hosp)^2 * p_hosp * (1 - p_hosp)
For ICU admission or death
n1_icu ← (1.96/target_margin_error_icu)^2 * p_icu * (1 - p_icu)
Step 2: What sample size will produce predicted values that have a small mean error across all individuals?
Number of candidate predictor parameters
cand_pred ← 22
Targeted Mean Absolute Prediction Error (MAPE)
targeted_MAPE_any ← 0.03
targeted_MAPE_outp ← 0.03
targeted_MAPE_hosp ← 0.02 # Lower as outcome rarer and more important
targeted_MAPE_icu ← 0.01 # Lower as outcome rarer and considerably more important
For any outcome
n2_any ← exp((-0.508 + 0.259 * log(p_any) + 0.504 * log(cand_pred) - log(targeted_MAPE_any))/0.544)
For outpatient visit or worse
n2_outp ← exp((-0.508 + 0.259 * log(p_outp) + 0.504 * log(cand_pred) - log(targeted_MAPE_outp))/0.544)
For hospital admission or worse
n2_hosp ← exp((-0.508 + 0.259 * log(p_hosp) + 0.504 * log(cand_pred) - log(targeted_MAPE_hosp))/0.544)
For ICU admission or death
n2_icu ← exp((-0.508 + 0.259 * log(p_icu) + 0.504 * log(cand_pred) - log(targeted_MAPE_icu))/0.544)
Step 3: What sample size will produce a small required shrinkage of predictor effects?
For this step one must specify the desired amount of shrinkage (1 - expected uniform shrinkage factor)
expected_uniform_shrinkage_factor ← 0.95
and choose a conservative value of anticipated model performance (Cox-Snell R2). I will use the method described in https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8806 to estimate a conservative CSR2 from a C-statistic, as I find this more intuitive
approximate_R2 ← function(auc, prev, n = 1000000){
#define mu as a function of the C statistic
mu ← sqrt(2) * qnorm(auc)
#simulate large sample linear prediction based on two normals for non-eventsN(0, 1), events and N(mu, 1)
LP ← c(rnorm(prev*n, mean=0, sd=1), rnorm((1-prev)n, mean=mu, sd=1))
y ← c(rep(0, prevn), rep(1, (1-prev)n))
#Fit a logistic regression with LP as covariate; this is essentially a calibration model, and the intercept and slope estimate will ensure the outcome proportion is accounted for, without changing C statistic
fit ← lrm(y~LP)
max_R2 ← function(prev){
1-(prev^prev(1-prev)^(1-prev))^2
}
return(list(
R2.nagelkerke = as.numeric(fit$stats[‘R2’]),
R2.coxsnell = as.numeric(fit$stats[‘R2’]) * max_R2(prev)))
}
anticipated_R2_any ← approximate_R2(auc = 0.8, prev = p_any)$R2.coxsnell
anticipated_R2_outp ← approximate_R2(auc = 0.8, prev = p_outp)$R2.coxsnell
anticipated_R2_hosp ← approximate_R2(auc = 0.8, prev = p_hosp)$R2.coxsnell
anticipated_R2_icu ← approximate_R2(auc = 0.8, prev = p_icu)$R2.coxsnell
For any outcome
n3_any ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_any/expected_uniform_shrinkage_factor)))
For outpatient visit or worse
n3_outp ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_outp/expected_uniform_shrinkage_factor)))
For hospital admission or worse
n3_hosp ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_hosp/expected_uniform_shrinkage_factor)))
For ICU admission or death
n3_icu ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_icu/expected_uniform_shrinkage_factor)))
Step 4: What sample size will produce a small optimism in apparent model fit?
First one must calculate the maximum Cox Snell R2 given the estimated proportion experiencing the outcome as shown in the supplement of https://www.bmj.com/content/368/bmj.m441
n ← 100
ln_Lnull_any ← p_anyn * log((p_anyn)/n) + (n - p_anyn) * log(1 - ((p_anyn)/n))
ln_Lnull_outp ← p_outpn * log((p_outpn)/n) + (n - p_outpn) * log(1 - ((p_outpn)/n))
ln_Lnull_hosp ← p_hospn * log((p_hospn)/n) + (n - p_hospn) * log(1 - ((p_hospn)/n))
ln_Lnull_icu ← p_icun * log((p_icun)/n) + (n - p_icun) * log(1 - ((p_icun)/n))
maxR2cs_any ← 1 - exp((2 * ln_Lnull_any)/n)
maxR2cs_outp ← 1 - exp((2 * ln_Lnull_outp)/n)
maxR2cs_hosp ← 1 - exp((2 * ln_Lnull_hosp)/n)
maxR2cs_icu ← 1 - exp((2 * ln_Lnull_icu)/n)
delta ← 0.05 # expected optimism
and then calculate the estimated optimism corrected shrinkage
est_optimism_corrected_shrinkage_any ← anticipated_R2_any / (anticipated_R2_any + delta * maxR2cs_any)
est_optimism_corrected_shrinkage_outp ← anticipated_R2_outp / (anticipated_R2_outp + delta * maxR2cs_outp)
est_optimism_corrected_shrinkage_hosp ← anticipated_R2_hosp / (anticipated_R2_hosp + delta * maxR2cs_hosp)
est_optimism_corrected_shrinkage_icu ← anticipated_R2_icu / (anticipated_R2_icu + delta * maxR2cs_icu)
For any outcome
n4_any ← cand_pred / ((est_optimism_corrected_shrinkage_any - 1) * log(1 - (anticipated_R2_any/est_optimism_corrected_shrinkage_any)))
For outpatient visit or worse
n4_outp ← cand_pred / ((est_optimism_corrected_shrinkage_outp - 1) * log(1 - (anticipated_R2_outp/est_optimism_corrected_shrinkage_outp)))
For hospital admission or worse
n4_hosp ← cand_pred / ((est_optimism_corrected_shrinkage_hosp - 1) * log(1 - (anticipated_R2_hosp/est_optimism_corrected_shrinkage_hosp )))
For ICU admission or death
n4_icu ← cand_pred / ((est_optimism_corrected_shrinkage_icu - 1) * log(1 - (anticipated_R2_icu/est_optimism_corrected_shrinkage_icu)))
Results
max(n1_any, n1_outp, n1_hosp, n1_icu)
[1] 1117.906
max(n2_any, n2_outp, n2_hosp, n2_icu)
[1] 1722.742
max(n3_any, n3_outp, n3_hosp, n3_icu)
[1] 151392.2
n3_any
[1] 4770.586
max(n4_any, n4_outp, n4_hosp, n4_icu)
[1] 15437.67
n4_any
[1] 1047.569