Conceptual question about survival analysis vs logistic regression: when is one more appropriate than the other?

Hi,

I have been thinking about this question, and have some questions/comments of my own:

  1. What is the distribution of length of stay for these patients? Is it typically on the order of days, weeks or months, and are there presumably high outliers? The basis of my question is that if the length of stay is typically short, say less than 30 days, it might be reasonable to use logistic regression, with the outcome being a dichotomous discharge status (alive or dead) since the relative granularity of time is low.

  2. As Frank notes, length of stay is an outcome variable, not an a priori covariate measured at admission. That is, length of stay cannot be known at admission, only once the patient is discharged. Thus, if you use logistic regression, length of stay should not be a covariate, but if you use a Cox model, it would be your time to event metric.

  3. Keep in mind that if you should want to generate predictions from the model, the Cox model will give you probabilities of discharge alive (or death) as a function of time, whereas a logistic model will give you the probability of the outcome at discharge as a dichotomous event, irrespective of time. The underlying question that you are trying to answer might influence the approach that you take.

  4. Since you have 18 years worth of data, and some patients may have multiple observations, presuming each observation meets your inclusion criteria of having an active salmonella infection at admission, do you need to consider time varying covariates in your model for patients where some baseline characteristic changed from one admission to the next?

  5. An extension to my prior point above, is the need to consider multiple observations per patient, in an appropriate modeling framework. The discussion has focused on Cox and logistic models, but you may need to think about mixed effects variations of both, a robust covariance matrix adjustment, or possibly a GEE model, depending upon the approach you elect to take. Multiple observations per patient will increase your effective sample size, resulting in standard errors that are too small. I am not an expert in mixed effects models, but from what I have seen, there are general rules of thumb suggesting a minimum cluster size of 3, which I suspect your observations do not meet. Thus, a robust covariance adjustment, such as using Frank’s robcov() function in his RMS package for R, might make more sense here.

  6. With the 18 year time frame for the observations , what changes in the practice of medicine may have occurred over that time frame, such that, for example, the probability of death in the early patients is higher than in the later patients, because of improvements in the treatment of the patients that may mitigate the risks of death over time?

Food for thought…

12 Likes