Hi there,
I have a question regarding the use of the partial likelihood ratio statistic to assess variable importance in a Cox Proportional Hazards model.
Specifically, is it appropriate to apply the procedure to the full model—including all predictors and spline-adjusted terms—even if the complete model may be prone to overfitting?
In my case, the final model I intend to use is a more reduced version, with transformed variables (using AVAS, Adaptive Variable Selection) and sparse PCA applied. However, in that scenario, assessing the importance of individual variables becomes much more complex.
Any insights or recommended practices on how to approach this would be greatly appreciated.
Measures will be biased if you remove observed-to-not-be-very-important parameters from the model. R^{2}_\text{adj} accounts for overfitting in an in-sample way.
When the model building uses other steps such as PCs, variable importance may need to be assessed more manually. For example you can delete one variable at a time from the entire analysis (PC or not) and see how much the deviance suffers.
I wonder if some of the ideas from this paper would be helpful for you? Specifically how they build a reference model based on a PCA style reduction procedure (different from yours) but then do the projected prediction feature selection/variable importance based on the original features?
That is a key reference. A P.S. to my earlier response: if you use sparse PCA you can compute importance of clusters of variables. This solves the problem of individual collinear variables competing with each other.
Thank you for your kind responses Both suggestions are useful to tackle the problem. Assessing variable importance is always challenging, and it becomes more difficult (at least with my narrow knowledge) when multiple transformations arise… I will try use both approaches, to avoid that potential collinearities bias the results.