Modeling cost using negative binomial regression


I was wondering if cost of, say certain surgical procedure, can be modeled with negative binomial distribution. To my understanding, cost is a continuous variable and is not a count.

Searching for the phrase “cost model negative binomial” in Google returns several papers on the topic where comparisons have been made among several models including NB.

Any thoughts on this would be appreciated.


I’m interested.

In class, etc. the negative binomial distribution has been used to model count of times until failure.

So estimating the parameters of a negative binomial distribution using existing data to model cost of surgery may not give you the results you want.

But what are you interested in? May be there’s another distribution one could use as a likelihood to model cost of a surgical procedure.

No experience here, but it might suffice to use a symmetric real distribution such as normal or student t.

What existing literature have you observed that models cost of surgical procedure and what likelihood functions have they used?

Specifically, my question is if using negative binomial is valid for modeling cost. I suspect it is not but in the literature, I’ve found a few examples of modeling cost data using NB.

For example

More sensible approach would be to use Gamma distribution under GLM framework since cost data are right skewed.

No, it’s generally used to model counts of failure until success

Search the statistics literature instead, and mathematics literature, and see what the motivation for the NB distribution was for. Don’t search clinical literature.

Sure, seems possible. But like any modeler, it would take assessing model fit, with whatever tools you have. No model should be taken for granted on any application. Even same application different data model might need adjustment.


I thought of an example where a non-gaussian likelihood is used.

In actuarial science, because insurance claims data is zero inflated and over dispersed (lots of 0’s and high variance and sometimes multimodality), people use the “Tweedie” distribution as a likelihood function, it’s a poisson-gamma mixture. It wouldn’t make sense to model it with a gaussian likelihood.

Any actuarial textbook will have examples. There’s a bunch of interesting slides on Casualty Actuarial Society’s website, too. Possibly could have clinical applications, if that kind of data arises.