Hello everyone,
I am trying to perform a survival analysis and create a model using Cox regression. Unfortunately, I`m much more of a physician than statistician and I would be very grateful for your help.
The dependent variable is 1-year mortality (~220events/2600sample) and the independent variables will include 7 baseline variables (e.g. age, history of recent high-risk CAD, stroke, urgent/emergent surgery) + type of surgery (5 types) + occurrence of 4 complications within 30 days (myocardial injury, bleeding, AKI, sepsis). There is no serious problem with data missingness.
Unfortunately, I am facing a problem of non-proportional hazard for 3 of 4 complications included in the model. I`ve searched everywhere for viable solutions of this problem and finally I created a Cox model with time-depentend variables, using the following code, based on https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf :
Legend:
vascular - main dataset
AKI30 - AKI status at 30 days (0/1)
AKIday - day of AKI occurrence // censoring day
sepsis30 - sepsis status at 30 days (0/1)
SSCday - day of sepsis occurrence // censoring day
bleed30 - major bleeding status at 30 days (0/1)
bleedday - day of bleeding occurrence // censoring day
dday- day of death // consoring day
death1 - occurrence of death
CODE:
#Preparation of event_day variable for tmerge function
AKIday <- ifelse(vascular$AKI30==1,vascular$AKI30day,NA)
SSCday <- ifelse(vascular$sepsis30==1,vascular$sepsis30day,NA)
bleedday <- ifelse(vascular$bleed30==1,vascular$bleed30day,NA)
#generation of dataset with time-dependent variables
data2 <- tmerge(vascular, vascular, id=studyid, dstat=event(dday, death1), AKI30 = tdc(AKIday), SEPSA = tdc(sepsis30day), BLEED = tdc(bleedday))
#generation of Cox model with time-depentent variables
tdcox <- coxph(Surv(tstart, tstop, dstat) ~ age + COPD + RHRCAD + PeripheralVascularDisease + Urgent_Emergent + HXCVE + CNCR + SurgeryType + SEPSA + AKI30 + BLEED + MINS, data2)
summary(tdcox)
cox.zph(tdcox)
I receive output which makes clinical sense - age, cancer, peripheral vascular surgery and all 30-day complications are significantly associated with 1-year mortality.
However I have several crucial questions:
- Compared to initial model (without time-dependent variables) the CIs are wider - is it associated with the fact that after using tmerge the data2 dataset contains 800 more observations? Can and should I do something about it?
- Is this an error that only part of variables are recoded using tmerge while others (age, comorbidities etc.) are not?
- Do I have to assess the proportional hazard assumption for tdcox model? Are there any other assumptions I should assess?
- Is it necessary to account somehow for the fact that some of the patients had 2 or 3 of the evaluated complications?
- How should I present the results of this analysis? Do I simply rewrite HR (95% CI) from the tdcox output?
- And most of all is this correct and if not, why not? I would like to stick to Cox model and if possible to avoid accelerated failure time.
Thank you in advance for your help. Greetings!