Model to assess length of stay (LOS)

Length of stay is very often positively skewed (regardless of sample sizes) and hence, we cannot fit a linear model to it.

I understand from here and here that LOS is often best analysed with Cox PH model, where event of interest is successful discharge and other events eg death, discharge against advice/AOR, abscond (assuming patients were not medically stable) are censored.

How should we analyse LOS when information on death and discharge disposition are not available? Would quantile regression or gamma regression with log-link suffice?

1 Like

Quantile regression require a completely continuous Y. It doesn’t work for the excessive ties you’ll see in LOS unless LOS is measured in hours. I wouldn’t think gamma regression would fit very well.

I understand not knowing D/C against medical advice but not how you would not know about deaths.

Unfortunately, the data I received does not have the death date, it only has the indicator for inpatient mortality. The proportion is small though at approximately 3-4%; cox PH would still work best? What is a workaround for when I really can’t get the death date?

What about others like robust linear model?

We’re only needing inpatient mortality. Yes, Cox PH is probably preferred.

Length of hospital stay is a very difficult outcome when there is a non-trivial risk of in-hospital death because hospital stay may come to an end because of either recovery (good) or death (bad) and the difference matters if you’re trying to interpret it as a standalone outcome.

Censoring in-hospital deaths on the date of death implies that discharge is an event which may occur with further follow-up, which it isn’t. Death is a competing event. And in this case it is diametrically opposed to the event of interest, so it can’t be analysed as a combined event as for, say, progression-free survival. The simplest solution, which happily solves your data problem, is to treat the deaths as permanent hospitalisations. That is, censor them at the end of follow-up.

This post has some more discussion and references, including an example of the suggested approach to censoring deaths from the RECOVERY trial:

We used Kaplan-Meier survival curves to display cumulative mortality over the 28-day period. We used similar methods to analyse time to hospital discharge and successful cessation of invasive mechanical ventilation, with patients who died in hospital right-censored on day 29.

Josie it’s not that we expect discharge to occur after death but instead are just saying that discharged hasn’t happened yet so positive credit should not be given. I’m not clear why death has to be formally a competing even if you define the outcome as time until successful discharge.

It is a competing event because it makes the event of interest impossible. And if we censor them anyway, it is informative censoring; they’ve been lost to follow-up because they died. If you drop them from the denominator at the time of death, the rate of discharge would look higher than it really is because you’re ignoring some people who will never be discharged.

Thanks for keeping this discussion going as I’m trying to nail down my thinking about this.

We really don’t want to drop deaths from the denominator. I can see where it may be informative censoring, which is I think the same as saying that the censoring process (death) is not independent of the event process (discharge).

Competing risk analysis will provide a cumulative incidence function that estimates the risk of a “discharge that precedes death”. I’m not sure that is a risk that is of central interest. So perhaps a multistate transition model is the way to go. With full date the states would be updated daily and would consist of in-hospital, discharged home, discharged to rehab facility, discharged to total care nursing facility, possible discharge against medical advice, or death. Then anything of interest can be estimated such as the probability of being alive and in the hospital on day 3, or the expected number of days alive and in the hospital, or the probability of being dead on day 5.


Surely the LOS is different things depending on what the process is that you are measuring. If you want to look at LOS as a measure of successful treatment then the difficulty is how you manage deaths. The distribution of deaths is often different from non-deaths depending on the population - intensive care units vs surgical vs medical.
An ordinal outcome captures this best (and can be collapsed to recovery or death if that is useful) if you want to more fully understand the process generating the discharge. Giving the deaths a uniform value (greater than the study duration) is problematical surely.

Yes, like many situations an ordinal longitudinal state transition model has the most promise IMHO. No hiding of deaths, and can easily handle hospital readmission. To use it for resource utilitization studies (as opposed to quality outcome studies as we have been discussion) you can compute the expected number of days in hospital and alive, which is what is correlated with hospital charges.

Censoring deaths at the end of follow-up (ie treating them as permanent hospitalisations) gives a meaningful time-to-discharge estimate for those discharged alive, and the estimate of the proportion discharged by time t will be accurate. If you censor deaths on the date of death the proportion discharged will be inflated for every time point after the first death, and increasingly inflated with every death that occurs; median (or other quantile) time-to-discharge will similarly be underestimated. When comparing two arms, if the risk of in-hospital death differs (by chance or in reality) then censoring deaths at the time of death will bias the comparison.

It effectively just puts a floor under the survival curve equal to the proportion of people who die in hospital (and thus will never be discharged). Dropping the deaths from the analysis at the point where the deaths occur guarantees that the proportion discharged alive will be at or near 100%* regardless of how many die in hospital. It doesn’t tell you anything useful.

If all you want to know is length of hospital stay (eg for resource use), you can use the combined event “death or discharge” but you can’t interpret this as a standalone outcome because it will be shorter for treatments which speed recovery and also for those which hasten death. It would be appropriate for a well-specified decision-analytic model which was considering both costs and benefits because the reduced hospitalisation cost of an early death would be massively outweighed by the life years (or QALYs) lost. But you’d still need to be very careful about how the data were modelled, presented and interpreted.

*in an orderly fashion, not just as an obvious artefact of the Kaplan-Meier curve