Shape of HR after an intermediate event based on time to that event, multi-state model

Hello everyone!

This is my first post. I am a clinical oncologist and have basic knowledge in statistics and a lot of motivation to learn more. So please respond accordingly. I have searched this forum for an answer to my question but could not find it.

I currently working with epidemiology research. I have collected data for roughly 500 lymphoma patients over a 20 year span. Data include gender, date of diagnosis, age at diagnosis, date of eventual relapse, date of last follow up and status at last follow up.

I am interested to explore how the shape of hazard ratio after an intermediate event (in this case relapse) depends on the time to occurrence of this event and/or the sojourn time in this state. Obviously not all patients experience relapse and relapse timeframe differs from e.g. 1 months - 10 years after initial diagnosis. So from my understanding this is a multi state model with 3 states.

I could only find one paper addressing this issue:
As a clinician with basic programming skills in R, I could only understand the concept but I have no idea how to do this in R.

Question 1: I want to find background theory on this, e.g. book, paper
Question 2: Can someone here open the subject and explain pros and cons of different methods in plain semi-easy statistical language
Question 3: How can this be done in R, example code, at least how to start it.

Thank you all in advance, looking forward to a productive discussion.

1 Like

Hi, I’m an oncologist too. I guess you’ll get better advice here than my opinion but we did something similar last year with the flexsurv R package. Look at the plot, I think it’s just what you want.


Wow, you guys work fast here. Thank you so much.

For some reason I wasn´t able to access the full text:

Not Found

The requested URL /HTML/sso/ejournals/login.htm&hook_url= was not found on this server.

Not sure what the problem is, I should have full access through my University.

Look also here


Thanks, very nice paper.

Just want to clarify, so in the paper, “Dynamic effect of VTE”: In the log-logistic AFT models, the development of CAT shortened PFS and OS with adjusted TR of 0.72 (95% CI, 0.49–1.06)
and 0.56 (95% CI, 0.43–0.74), respectively. Does this mean that VTE increased the hazard of mortality by 44%?

Also, I tried the R code provided in the supplement and got stuck at “Transitions-specific covariates” with a code line:$VTE_td[$trans == %] <- &

Error: unexpected input in$event_td[$trans==%] <- &

Figure 1 in the paper, is the one I would want to re-create

I obviously used own dataset

No, it is an AFT model whose exponential coefficients should be interpreted as time ratios. It is well explained in the article. In this particular case, it means that the life span is shortened by 44% after a thrombotic event, taking into account the structure of a multi-state model (avoiding immortal time issues, etc). In the link there is a very beautiful vignette written by Chris Jackson who explains how to extract time-variant hazard ratios from this AFT model, and that is what we did.

Regarding the R code, I recommend you to read and become familiar with the documentation of the mstate package in order to convert data to long format and obtain the structure of a multi-state model before fitting any model. Best.

1 Like

Great, this is clear now. Reading this tutorial right now. Thanks you for for taking the time to answer and thank you for your patience. Best.

This is the book I read for this: