Sample size determination-confirmatory prognostic factor study

Julia_zhan · April 24, 2026, 2:03pm

Dear experts

I would be grateful for your guidance on a methodological issue regarding the design of a confirmatory prognostic factor study. The aim of this study is to quantify the magnitude of association of a candidate variable after adjustment for established prognostic factors. In this context, I would appreciate your advice on how best to approach sample size calculation to ensure the study is adequately powered to detect such an association.

In particular:

Prof. Doug Altman has emphasised that sample sizes for prognostic factor studies should be sufficiently large to account for potential biases such as multiple testing and missing data[1]. While I am aware that standard sample size formulae exist for Cox proportional hazards, linear, and logistic regression models[2,3], these typically do not explicitly incorporate such issues. Do you think these sources of bias should be formally accounted for when deriving sample size for prognostic factor studies, and if so, do you have any recommendations on how this can be practically approached?
In the book Principles and Practice of Clinical Trials (by Steven Piantadosi, Curtis L. Meinert), it is noted that covariate adjustment may increase or decrease statistical power depending on the models use. The course notes (section 4.12.2), states that some degree of overfitting may be acceptable for adjusting confounders. If the primary aim is estimation of association rather than calculating the predictive ability of a variable, could you advise on what level of overfitting might be acceptable in practice, and how this might influence sample size considerations across different modelling approaches? I would also be very grateful for any relevant references, or practical insights you could share based on your insight and experience.

Many thanks in advance

References:

[1]Altman, D.G. (2006). Studies Investigating Prognostic Factors: Conduct and Evaluation. In TNM Online (eds L.H. Sobin, M.K. Gospodarowicz, B. O’Sullivan, L.H. Sobin, D.E. Henson and R.V.P. Hutter). https://doi.org/10.1002/0471463736.tnmp04.pub2

[2]Hsieh, F.Y., Bloch, D.A. and Larsen, M.D. (1998), A simple method of sample size calculation for linear and logistic regression. Statist. Med., 17: 1623-1634. https://doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S

[3]Schmoor, C., Sauerbrei, W. and Schumacher, M. (2000), Sample size considerations for the evaluation of prognostic factors in survival analysis. Statist. Med., 19: 441-452. https://doi.org/10.1002/(SICI)1097-0258(20000229)19:4<441::AID-SIM349>3.0.CO;2-N

f2harrell · April 24, 2026, 2:24pm

Covariate adjustment, as long as the covariates have some impact on Y, always increases power. For nonlinear model it seems to decrease power but this is a mirage. The standard errors can increase with covariate adjustment in say logistic or Cox models, but the \hat{\beta} increase more than that.

Julia_zhan · April 24, 2026, 3:04pm

Thank you for your explanation, it’s very interesting to learn that the apparent loss of power in nonlinear models may be a “mirage.” Could you point me to any references where this phenomenon is discussed in more detail, especially the idea that in logistic or Cox models the standard errors may increase with covariate adjustment but the effect estimate increases more?

f2harrell · April 24, 2026, 5:27pm

See this and its references.

arthur_albuquerque · April 25, 2026, 8:43pm

This one is also helpful: https://onlinelibrary.wiley.com/doi/full/10.1002/bimj.201900297