Hello. I am supposed to begin thinking about modelling dialysis appointment attendance. I think it makes sense to model attendance as a yearly rate, patients average 156 appointments a year and most patients do not miss any appointments so there will be many zeros in terms of appointment absenteeism. Most papers in this area have used poisson and negative binomial regression models to model patient absenteeism. I recall reading an online Frank Harrell comment where he recommended using an ordinal regression model over a poisson model as it could handle many zeros better (hopefully my recollection is close). I am wondering if selecting an ordinal model here would be appropriate to model the rate of absenteeism as I am interested in trying something new.
Thanks for any direction!
Ignoring for the moment the assumption that any regression model makes for how covariates change the distribution of Y, semiparametric ordinal models have a major advantage of not assuming any particular distribution of Y for a specific covariate setting. As you mentioned, this allows for extreme clumping at zero. It also allows for bimodality, clumping in the model, skips, and other departures from the usual distributional assumptions. So I think it’s worth a try. You might try both the proportional hazards (log-log link) and the proportional odds model.
Occasionally you need a Heckman two-stage model, e.g., a binary logistic model for P(Y > 0) and a more continuous model for the conditional distribution of Y | Y > 0. That would be especially needed if covariates act much differently on predicting the number of missed visits given any missed visits than than predict any missed visits.