How to handle MNAR from a covariate due to witnessing the survival outcome?

Naj · September 9, 2021, 5:06am

Hello everyone,
I was told that handling missing not at random (MNAR) is complicated. I’m tempted to perform imputation from different methods (chosen blindly) and present the outcomes with sensitivity analysis using these methods vs complete cases. However, I’m writing here to ask if there is a specific concept/ method that could help my specific problem.
The registry started collected data for the covariate of my interest at a certain year (e.g. 2005) but the data was manly missing for patients who died or censored before that point. After 2005, I have only 100 patients missing the data from 100,000. Furthermore, most patients who survived beyond 2005 recalled the data and it was available in the registry (30,000 vs 5000 [missing data] patients). Using log-rank test for survival difference between the two groups, there was a statistically significant difference between patients in the registry before 2005 who are missing the data for that covariate and those who had the data.
I was advised to use left truncation, but I already have all outcomes data.
I’d appreciate any insight or advice to module with this covariate the best way possible.

Thank you

f2harrell · September 9, 2021, 11:53am

There may be some aspect of the design that I’m not understanding, but my first reaction is that for a covariate that is measured at a certain time where deaths may occur before that time, and death prevents making the measurement of interest, it may be necessary to do a conditional analysis (with no imputation). I.e. among those who survived to time t what happens to them after t. Time t would have to be well defined, e.g., time of first visit to a health system, time of first symptom, time of first diagnosis, etc.

pmbrown · September 10, 2021, 8:15am

if that’s something you can point to, ie it’s inherent, then id consider ‘study start’ as 2005 and analyse from here, adjusting for age at 2005, ie include anyone who is alive beyond jan2005. I realise this seems wasteful with data but i feel it can be justified under the circumstances - it’s often the case that data collection has commenced in earnest at some timepoint yet registries provide birth and death data prior to this timepoint. I assume you have a surplus of power, and a clean estimate will make the write up and discussion less burdened by bias caused by differential data collection. That’s just my feeling