RMS Cox PH Model

MarcoLanzi · October 15, 2024, 10:04am

Dear community, dear Professor, thank you for your help.
I might be posting in the wrong topic.
I am focusing on a Cox regression model with a time to event outcome (relapse) and a set of predictors.
My question is how to retrieve the baseline hazard after multiple impution via areg imput (i I’ve followed the case study in chapter 20 plus validation and calibration of chapter 12).

Shall I use the contrast function?

Thank you very much for your amazing support

Marco

f2harrell · October 15, 2024, 12:07pm

This is a big problem with multiple imputation. Once you’ve done traditional multiple imputation I’d use stacking of all the filled-in datasets for a final analysis where you derive coefficients and the underlying survival curve, since that analysis, though it has the wroing denominator, will be efficient and will not see any missing values.

MarcoLanzi · October 16, 2024, 7:18pm

Thank you Professor for the imput. In case this strategy goes beyond my limited R skills, an approximation with transcan freezing one dataset would be terribly wrong for the sake of the baseline hazard? I would still get confidence interval via multiple imputation.

Thank you very much
Marco

f2harrell · October 16, 2024, 8:05pm

Not clear about ‘freezing one dataset’. the underlying challenge is keeping the vector of distinct failure times the same for all imputations. If there are no missing failure times or event indicators maybe that’s not a problem.

MarcoLanzi · October 17, 2024, 3:49pm

Sorry for the poor wording I was referring to the procedure of the case study on Cox regression where we use transcan and only one imputation. Amazingly, I have no missing in time to event and in the indicator!

Thank you very much for your comment
Marco

Marc_Vila_Forteza · November 2, 2024, 7:51pm

Let’s put a question about variable importance in Cox models under this category.

Hi there,

I want to asses the variable importance of a dataset that will be fitted with Cox PHM model.

The RMS book (section 5.4) shows a way to do that by bootstrapping the ranks of predictors with regards of the partial χ2 minus its degrees of freedom.

When fitting later the Cox models, it would be possible that some of them do not fit properly the Cox assumptions (e.g. linearity, proportionality), so, maybe later we would remove a variable from the moedl or stratify it by any of them.

It could happen that although a variable is important it may not meet the assumptions of the Cox models. One question is whether the importance test we have done is still valid for these variables or not. So, we can say that this variable could be important but it does not meet the Cox assumptions?

The second question is how we can decide whether to incorporate an ‘important’ variable into the model if it does not meet the Cox assumptions. In other words, can we balance the option of considering a variable that does not meet exactly the assumptions but which has a high relative contribution?

Thank you!

Professor Harrell answer:

This raises a lot of interesting issues. I wish this topic had been put under the existing Cox model page so that others would more easily find it.

Dropping a variable or stratifying on it can be a worse solution than not satisfying PH in some situations. A coherent solution with more linear thinking is to fit a Bayesian proportional hazards model with non-PH for most of the variables, with non-PH handled by interacting covariates with time. Prior distributions would restrict how much non-PH you want to allow especially for small samples. Variable importance measures would combine the importance of PH and non-PH components for a covariate.

Marc_Vila_Forteza · November 2, 2024, 8:02pm

Thank you for your quick reply.
In my specific case, I have a very important variable (SDT), which I parameterised with log to relax linearity. The residuals plot is like this:

cph_cox_model ← cph(Surv(MTBR, STATUS) ~ log(SDT+1), data = g, time.inc=1500, surv = TRUE, x=TRUE,y=TRUE, tol=1e-13, ties = ‘efron’, id = id)

cox.zph(cph_cox_model)
chisq df p
log(SDT + 1) 82.5 1 <2e-16
GLOBAL 82.5 1 <2e-16

Do you have suggestions about which transformation could help me to relax the negative slope? I tried different ones (srt, 1/log(), ^2 and restricted cubic splines) and I did not improve it.

If it 's not possible, do you think that the bayesian approach would be a choice? I guess there would be a way to assess how this assumption violation could affect the performance of the model, either using only this variable or using all the others together with it. but I’m still looking for it.

Thank you!

f2harrell · November 2, 2024, 9:03pm

\beta(t) vs t is a plot of the changing effect of one \beta over time. This may inform you about how to transform t in a time-dependent covariate but is sort of unrelated to the transformation of X on the right hand side of the model.

Marc_Vila_Forteza · November 3, 2024, 10:57am

My log parameterisation allowed to maximise the relative chi2 in the model:

increasing what i got with linear approach:

This the hazard ratio vs time that I get with different values of SDT:

The model fitted with log approximation leads good calibration and overfitting results, but I have a concern with the shape of the residuals and the possible time-dependency of the hazard ratio of the variable SDT.

f2harrell · November 3, 2024, 11:32am

Stratify SDT into 5-10 equal-sized groups depending on the overall N. Compute stratified K-M curves. Make plots of g(t) vs. t for g= log-log, logit, normal inverse. See if a link other than log-log (assumed by PH) works better, i.e., see if an accelerated failure time model provides a better fit. More here.

Marc_Vila_Forteza · November 21, 2024, 12:26pm

Hi,
with regards: “A coherent solution with more linear thinking is to fit a Bayesian proportional hazards model with non-PH for most of the variables, with non-PH handled by interacting covariates with time. Prior distributions would restrict how much non-PH you want to allow especially for small samples. Variable importance measures would combine the importance of PH and non-PH components for a covariate.”

→ Which RStudio package/s would you reccomend to fit bayesian proportional hazards model with RStudio to address the above-mentioned issues?

Thank you!

f2harrell · November 21, 2024, 1:01pm

RStudio is a graphical user interface and integrated development platform for R. So I assume you meant to say R instead of RStudio.

For the types of survival models I’m alluding to, the best best is flexible parametric proportional hazards models. These are beautifully implemented in the rstanarm survival system in R: https://arxiv.org/pdf/2002.09633 . This is not available from CRAN but there are installation instructions in the paper.