ML Approaches with censored data

tspeidel · April 10, 2024, 1:31am

A colleague shared this blog post from AWS. They are using machine learning approaches to predict HVAC failures. The problem is framed as a classification. They do not say how many failures their large sample size has. It seems they may be taking moving windows of 60 days so that they can predict failures within that window.

I’ve seen many of these approaches applied to problems that scream censoring and my first reaction is one of frustration over the ignorance of 70 years of rich survival literature.

My background is in stat and this surely biases my views. I wonder what people in this forum think of these approaches, specifically when applied to survival data.

I know @f2harrell spoke and wrote a lot about the problems of classification, but I’m more interested in applications like the ones in the AWS link: are they badly biased? Do they end up underestimating survival? What are the drawbacks and are classical survival approaches the better answer?

f2harrell · April 10, 2024, 1:03pm

A key reference is https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780140108 - note the 1995 date! There are many recent papers about efficient implementation of Cox models in neural nets.

tspeidel · April 10, 2024, 2:03pm

Thanks for the reference @f2harrell What puzzles me is that in the linked post, is that they are not using survival methods (at least none that I can see).

f2harrell · April 11, 2024, 11:44am

Yes, the partial likelihood function they are generalizing to neural net is the Cox partial likelihood, which allows explicitly for right censoring.