I’d be grateful for your thoughts on designing logistic regression models with incomplete data - but when the data is ‘appropriately’ missing.
My team and I are modelling modifications to cancer treatment that occurred during the COVID pandemic - not from direct COVID infections, but in attempts to mitigate the vulnerability by cancer patients by deferring care thought to increase the susceptibility to COVID eg chemotherapy, immunotherapy.
Dataset is across a patient cohort with varying tumour types and multiple treatment modalities (eg chemotherapy, immunotherapy, radiotherapy) - however not all modalities are applicable or relevant to each patient. For a given patient: each modality is coded as
- no.mod: treatment not modified = continued as planned
- mod: treatment modified in some capacity (eg delayed, dose reduced or omitted entirely)
- NA: not applicable: that modality was not clinically relevant for that individual patient, so that modality can neither be modified or not modified
For a given patient, each modality may or may not be modified independently eg their chemotherapy was modified, their radiotherapy proceeded as planned.
Goal: metric of how likely each modality is to be modified (or not), after controlling for other clinical variables (eg age, performance status, tumour type) - odds ratio, 95% CI for each modality.
I had planned to use a binomial logistic regression model with outcome modified vs not, however the (appropriately) missing data precludes such a GLM across the whole cohort.
Any thoughts on how to proceed would be much appreciated!
Representative toy data below:
uid: patient unique identifier
age: patient age
ps: Performance status (PS) - validated measure in cancer of fitness or physical reserve
tumour: tumour type
(Will be working in R)