Reading list for an article about how modern "fraud detection" methods are deeply flawed

Dr. Harrell and others have written at length about the problems with sensitivity, specificity, and the confusion matrix, as well as about issues with the application of complex non-linear algorithms that make poorly-calibrated probability estimates; not to mention the use of improper scoring rules and unnecessary up/down sampling techniques. Much of the critique centers on epidemiology. I am not an epidemiologist or in the health research industry at all. I’m a “data scientist” who works for tech companies. Tech companies need to manage the risk of fraudulent online transactions. Unfortunately, tech companies are also chock full of machine learning engineers who turn risk prediction problems into classification problems. I know there is a lot of material on Dr. Harrell’s blog and in regression modeling strategies and elsewhere about the dangers of discrete improper scoring rules, and about the potential dangers of applying complex non-linear algorithms that falsely claim zero assumptions. But I wonder if anyone knows of literature that levels similar critique on fraud risk management. I want to write an article aimed at the tech industry about how the way we do fraud risk management is all wrong, and am looking for a reading list to round out my perspective. Thanks.


Great post. I’ve been looking for an excuse to post this citation, that you might find helpful. I’m not sure if it has been cited here in other postings, but if so, I could not find it.

Hand, David J. Classifier Technology and the Illusion of Progress. Statist. Sci. 21 (2006), no. 1, 1–14. doi:10.1214/088342306000000060 (link).

Some further links can be found in the following blog posts: