Problems with using “prophetic” variables in survival analysis are well known. Just to fix concepts, let me quote Terry Therneau:
This particular incorrect analysis is re-discovered every few years in oncology. (Or I should say republished). Like the mythical hydra, it never seems to go away. Group people, at baseline, according to whether they eventually had a complete or partial response to therapy (shrinkage of tumor), and then draw the survival curves. Surprise – responders always do better! Why?
He then goes on to explain that to have response determined at all, you need to live long enough to reach assessment. Finally he gives the well-known recommendation:
Some time-dependent covariates are not predictors of an event as much as they are markers of a failure-in-progress. […] Basic rule: At any time point, the covariates can be anything that you want, as long as they use only information that would have been available on that day, had analysis been done then.
Now, consider the following example (Antonia et al. Lancet Oncol 2019):
At first glance, my impression was “ah, again what Terry was talking about, grouping people according to response”. However, if you look more carefully, you’ll see that the horizontal axis measures time not from the original index time, but rather from 6 months, at the time when response was assessed. So, in these survival plots response is not a prophetic variable as it was determined for everyone, and was known at Time 0 (of this plot), those who had no response assessment were excluded (as described by the footnote).
For some reason I still have some bad feeling about this whole plot, so I thought it worth asking you, if you see any problem with this analysis.