I have a question regarding the external validity of semi-parametric models in prediction of time-to-event outcomes based on baseline predictor measurements only (no time-dependent covariates).
In clinical prediction modelling, it is key that a model performs well outside the derivation dataset. To this end, prediction models are validated in data that were not part of the derivation modelling process. For validation of a Cox proportional hazard model, absolute risk predictions would be obtained using information on the t-year baseline hazard from the derivation dataset, coefficients estimated in the derivation dataset and covariate input from the validation dataset.
For example, if I’d want to obtain the 10-year cardiovascular disease prediction for a male in the general population, I could consult a Framingham prediction model, and go to
https://framinghamheartstudy.org/fhs-risk-functions/cardiovascular-disease-10-year-risk/
The reported 10-year baseline hazard (0.88936) could be exponentiated by a centered linear predictor obtained from the reported coefficients and some covariate input to obtain a risk prediction.
I’m wondering: how valid is the reported baseline hazard at 10 years for new individuals?
Alternatively, the baseline hazard could have been modelled parametrically, which of course entails the risk of misspecification. For prediction modelling, however, I am curious whether parametric baseline hazards could reduce the likelihood of overfitting with respect to derivation data.
Do you have any thoughts on this or suggestions for further reading? Many thanks in advance!