Dear all,
I read some literature proposing sample size calculations for a study on a new prognostic factor, e.g. a new biomarker and its association with 1 year all-cause mortality.
Example: Schmoor, C., Sauerbrei, W. and Schumacher, M. (2000), Sample size considerations for the evaluation of prognostic factors in survival analysis. Statist. Med., 19: 441-452.
I thought this or other formulas could be used before data collection to determine how many patients to include. I was wondering about two topics.
- If we collected data already, and want to investigate a new prognostic factor (which is frequently done with biobanks), is there any method that does not have the problem of a post hoc sample size calculation?
- As these are usually explanatory models and we adjust based on subject matter knowledge, what if I am still worried of overfitting and want to use less degrees of freedom (all sample size formulas I know are for prediction models).
Thanks in advance!
Koray