Dear experts
I would be grateful for your guidance on a methodological issue regarding the design of a confirmatory prognostic factor study. The aim of this study is to quantify the magnitude of association of a candidate variable after adjustment for established prognostic factors. In this context, I would appreciate your advice on how best to approach sample size calculation to ensure the study is adequately powered to detect such an association.
In particular:
-
Prof. Doug Altman has emphasised that sample sizes for prognostic factor studies should be sufficiently large to account for potential biases such as multiple testing and missing data[1]. While I am aware that standard sample size formulae exist for Cox proportional hazards, linear, and logistic regression models[2,3], these typically do not explicitly incorporate such issues. Do you think these sources of bias should be formally accounted for when deriving sample size for prognostic factor studies, and if so, do you have any recommendations on how this can be practically approached?
-
In the book Principles and Practice of Clinical Trials (by Steven Piantadosi, Curtis L. Meinert), it is noted that covariate adjustment may increase or decrease statistical power depending on the models use. The course notes (section 4.12.2), states that some degree of overfitting may be acceptable for adjusting confounders. If the primary aim is estimation of association rather than calculating the predictive ability of a variable, could you advise on what level of overfitting might be acceptable in practice, and how this might influence sample size considerations across different modelling approaches? I would also be very grateful for any relevant references, or practical insights you could share based on your insight and experience.
Many thanks in advance
References:
[1]Altman, D.G. (2006). Studies Investigating Prognostic Factors: Conduct and Evaluation. In TNM Online (eds L.H. Sobin, M.K. Gospodarowicz, B. O’Sullivan, L.H. Sobin, D.E. Henson and R.V.P. Hutter). https://doi.org/10.1002/0471463736.tnmp04.pub2
[2]Hsieh, F.Y., Bloch, D.A. and Larsen, M.D. (1998), A simple method of sample size calculation for linear and logistic regression. Statist. Med., 17: 1623-1634. https://doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S
[3]Schmoor, C., Sauerbrei, W. and Schumacher, M. (2000), Sample size considerations for the evaluation of prognostic factors in survival analysis. Statist. Med., 19: 441-452. https://doi.org/10.1002/(SICI)1097-0258(20000229)19:4<441::AID-SIM349>3.0.CO;2-N