Removing survivorship bias in a survival model


I am conducting a retrospective cohort study to determine the association between receiving medicine X with death during the first 100 days of therapy for patients with leukemia, from 2010-2020. Starting in 2015, all patients began receiving medicine X on Day 10 (t=10) of therapy to prevent infection, whereas nobody received it before. So, I have two cohorts to compare: those receiving medicine X, and those who didn’t.

Assuming era-dependent confounders are controlled for (e.g., quality of supportive care in each era, etc.), I’d like to determine the effect of medicine X on survival using a Cox PH model, but I am faced with the problem that many patients (10% of total cohort, 20% of events) die before day 10 and therefore die before they can receive medicine X (the exposure). As expected, this survivorship bias contributes to a large association between survival to 100 days and medicine X.

What strategy do you recommend to approach the issue of survivorship bias here, allowing for basic limitations to retrospective, non-randomized studies? Given that patients aren’t at risk of the event (death given medication X status) until they actually receive medication X, my first instinct is to simply left-truncate the data and describe the effect of medicine X as the hazard of death given survival to Day 10 (when they receive medicine X), assuming that confounders between those receiving medication X and those surviving are controlled for. Are there any other approaches I should consider? I considered a time-varying exposure, but I think it is inappropriate here given the specifics of medication X and the specific disease here, as the effect of medication X is expected to be quite different in days 1-10 than it would be in days >10, and I’m not interested in that question right now.


I think your approach is good. Though not perfect, doing an analysis that is explicitly conditional on survival 10 days will be good.

I’m not 100% sure, but this rings quite similar to the concept of “grace periods”, where participants, once eligible, can receive a treatment within a certain time window, which creates a conflict in how to classify (treatment/control group) events happening within that time window without inducing immortal-time bias.

There are two solutions I know of for handling such cases:

  1. Randomly assign the treatment group for events occurring within that time window
  2. clone-and-censor: for each participant clone it twice, once for each treatment assignment and artificially censor the clones that go off protocol (i.e., if participant A that survived through day 10 and received the treatment, its clone artificially assigned to the control group will get censored at time 10).
    Intuitively, this will cancel out the mutual person-times shared by both clones so you end up measuring the correct follow-up period while forcing time-zero to be at time of eligibility.
    You can then further adjust for informative censoring.

This is a very brief explanation. If that’s indeed relevant, you can read more in

1 Like