Cox model with time-dependent variables

Kamil_Polok · December 20, 2020, 2:48pm

Hello everyone,

I am trying to perform a survival analysis and create a model using Cox regression. Unfortunately, I`m much more of a physician than statistician and I would be very grateful for your help.

The dependent variable is 1-year mortality (~220events/2600sample) and the independent variables will include 7 baseline variables (e.g. age, history of recent high-risk CAD, stroke, urgent/emergent surgery) + type of surgery (5 types) + occurrence of 4 complications within 30 days (myocardial injury, bleeding, AKI, sepsis). There is no serious problem with data missingness.

Unfortunately, I am facing a problem of non-proportional hazard for 3 of 4 complications included in the model. I`ve searched everywhere for viable solutions of this problem and finally I created a Cox model with time-depentend variables, using the following code, based on https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf :

Legend:
vascular - main dataset
AKI30 - AKI status at 30 days (0/1)
AKIday - day of AKI occurrence // censoring day
sepsis30 - sepsis status at 30 days (0/1)
SSCday - day of sepsis occurrence // censoring day
bleed30 - major bleeding status at 30 days (0/1)
bleedday - day of bleeding occurrence // censoring day
dday- day of death // consoring day
death1 - occurrence of death

CODE:
#Preparation of event_day variable for tmerge function
AKIday <- ifelse(vascular$AKI30==1,vascular$AKI30day,NA)
SSCday <- ifelse(vascular$sepsis30==1,vascular$sepsis30day,NA)
bleedday <- ifelse(vascular$bleed30==1,vascular$bleed30day,NA)

#generation of dataset with time-dependent variables
data2 <- tmerge(vascular, vascular, id=studyid, dstat=event(dday, death1), AKI30 = tdc(AKIday), SEPSA = tdc(sepsis30day), BLEED = tdc(bleedday))

#generation of Cox model with time-depentent variables
tdcox <- coxph(Surv(tstart, tstop, dstat) ~ age + COPD + RHRCAD + PeripheralVascularDisease + Urgent_Emergent + HXCVE + CNCR + SurgeryType + SEPSA + AKI30 + BLEED + MINS, data2)

summary(tdcox)
cox.zph(tdcox)

I receive output which makes clinical sense - age, cancer, peripheral vascular surgery and all 30-day complications are significantly associated with 1-year mortality.

However I have several crucial questions:

Compared to initial model (without time-dependent variables) the CIs are wider - is it associated with the fact that after using tmerge the data2 dataset contains 800 more observations? Can and should I do something about it?
Is this an error that only part of variables are recoded using tmerge while others (age, comorbidities etc.) are not?
Do I have to assess the proportional hazard assumption for tdcox model? Are there any other assumptions I should assess?
Is it necessary to account somehow for the fact that some of the patients had 2 or 3 of the evaluated complications?
How should I present the results of this analysis? Do I simply rewrite HR (95% CI) from the tdcox output?
And most of all is this correct and if not, why not? I would like to stick to Cox model and if possible to avoid accelerated failure time.

Thank you in advance for your help. Greetings!

salil09 · December 22, 2020, 8:51pm

Hi,
I think I can answer some of your queries.

Yes, the CI are wider because the data is going to be split depending upon the time of when the intermediate event occured.
Some variables like age are going to remain constant and hence there will be no change in them , so they will not be recoded.
In my understanding TDCOX is a way to bypass the PH assumption, so I believe that you will not need to assess PH assumption. You may want to check the book by Terry Therneau - Extending the Cox model. Would appreciate if others can chime in and provide input regarding this point.
If you are really keen to model these time points separately, then you are actually looking at multi-state models. There is a package mstate and also etm, both these can do multistate modelling. Among these, mstate is well supported with many vignettes that use the BMT data and describe how to do the modelling. It also has its own msprep like tmerge to convert the data into the required format.
I think that they can be presented as HR. Again, would be interested in what others think.
All the best,
Salil