Time dependent variable with time dependent coefficient (at the same time)

albertoca · November 12, 2018, 10:54am

In the real world, the adverse prognostic effect of the variables often dilutes over time, if the patient survives long enough. Since the criterion of proportionality of the hazard is not met, special analysis methods are required. The coxph function in the R survival package allows implementing a function to obtain time-varying coefficients (tt = function (x, t, …) x * log (t) or similar). Liwewise, other variables are not present at the beginning of the study, but occur after a variable time of follow-up, they would be time-dependent variables. These variables are usually handled by restructuring the database (for example with the tmerge function in the survival package). Then, the Cox regression is implemented at intervals: coxph(Surv(time1, time2, status) ~ tdX…). However, it is not clear for me how to handle time-dependent variables, with time-varying coefficientes.
For example, after applying tmerge, and obtain the tstart and tstop indictors of the time intervals, I do not know how to modify the argument tt = function (x, t, …) x * log (t), accordingly, and I’m not sure that the approach is mathematically correct.
By the way, I’m a doctor, not a professional statistician. Can someone help me with maths and code?
Alberto Carmona Bayonas

f2harrell · November 12, 2018, 4:35pm

Here are some papers I’ve noted that are useful for time-dependent covariables.

For the case in which the predictors are non-varying baseline variables, it’s a good idea to question whether proportional hazard models are the best starting point. When hazard ratios converge to one over time for all predictors, accelerated failure time models such as log-normal and log-logistic may be called for.

albertoca · November 12, 2018, 9:31pm

Possibly the best idea.
Frank, thank you for providing methodological support to the entire planet, instantly and every time you are asked: “If on the other hand one were studying acutely ill patients whose risk factors wane in importance as the patients survive longer, a model such as the log-normal or log-logistic regression model would be more appropriate.” RMS

albertoca · November 15, 2018, 11:07am

A parametric model is possibly mathematically the most correct. But I worry about renouncing the concept of hazard ratio and that readers do not understand it. I had a new idea for this. In the mstate package using the msprep function, the time split could be coded as a transition of a Markov multistate model, with onset six months after the thrombotic event. Then, using a transition-specific covariate that represents the thrombosis status, the hazard ratio may be obtained at intervals, in the first 6 months from the thrombosis and onwards. Then I can use an arrival extended Markov model adding a continuous variable to codify for the time since the beguinning of the study to thrombosis. In this way I can obtain the measure of effect for early and late thrombosis, while controlling for the time in which thrombosis occurs on the natural history of cancer. It seems correct for me… Any suggestion?

f2harrell · November 15, 2018, 12:46pm

I have published papers in the medical literature using non-proportional hazards model with no problems. The idea of time ratios is quite natural, e.g., in a log-normal accelerated failure time model a regression coefficient of a binary predictor being equal to log(2) means that the median survival time is half as long if you have that risk factor than if you don’t. If using a continuous survival model, the main point is to choose a model with the best overall fit. My general guidance on that in medical problems is the Cox PH model tends to work best for chronic diseases and AFT models for acute diseases.