PC Austin, WA Ghali and JV Tu,
*Statistics in medicine*, Sep 2003 15

Investigators in clinical research are often interested in determining the association between patient characteristics and cost of medical or surgical treatment. However, there is no uniformly agreed upon regression model with which to analyse cost data. The objective of the current study was to compare the performance of linear regression, linear regression with log-transformed cost, generalized linear models with Poisson, negative binomial and gamma distributions, median regression, and proportional hazards models for analysing costs in a cohort of patients undergoing CABG surgery. The study was performed on data comprising 1959 patients who underwent CABG surgery in Calgary, Alberta, between June 1994 and March 1998. Ten of 21 patient characteristics were significantly associated with cost of surgery in all seven models. Eight variables were not significantly associated with cost of surgery in all seven models. Using mean squared prediction error as a loss function, proportional hazards regression and the three generalized linear models were best able to predict cost in independent validation data. Using mean absolute error, linear regression with log-transformed cost, proportional hazards regression, and median regression to predict median cost, were best able to predict cost in independent validation data. Since the models demonstrated good consistency in identifying factors associated with increased cost of CABG surgery, any of the seven models can be used for identifying factors associated with increased cost of surgery. However, the magnitude of, and the interpretation of, the coefficients vary across models. Researchers are encouraged to consider a variety of candidate models, including those better known in the econometrics literature, rather than begin data analysis with one regression model selected a priori. The final choice of regression model should be made after a careful assessment of how best to assess predictive ability and should be tailored to the particular data in question.