Comparing prediction models without likelihoods

noambard · July 26, 2021, 6:10pm

Hi everyone.

This feels like a basic question, but we have not found a clear answer.

We are studying the utility of using a novel predictor within a prediction model.
The baseline and augmented models are both “xgboost” models (gradient boosting with decision trees as base learners), so have no clear notion of likelihood.

We want to be able to say that the novel biomarker “improves” the prediction in a NHST framework
The standard “machine learning” way is to compare AUROC values.
We are familiar with the work (I think by Margaret Pepe) that shows that the statistical test for comparing AUROC is just a less powered likelihood ratio test.
We are also familiar with Prof. Harrell’s approach that says that the LR test is the go-to test for these kinds of questions.
The problem is, again, that the model has no defined notion of likelihood.

What then is the preferred approach to perform such a hypothesis test?

Thanks in advance,
Noam Barda

f2harrell · July 26, 2021, 7:54pm

An excellent question. Take a look at the measures here. If you want to get a confidence interval for the amount of improvement due to addition of a new predictor, consider bootstrapping one of those indexes.

noambard · July 27, 2021, 4:13am

Dear Prof.,

Thank you for the quick answer. The article is very helpful and seems to answer our question precisely.

Noam

QunnaLi · September 22, 2021, 1:30pm

Except for AIC, BIC, ChiSq/df, what are other good measures for model fit/model comparison for modeling count data, such as Negative Binomial model?

f2harrell · September 22, 2021, 8:43pm

Good question and I don’t have a real answer but I would look for a way to compare fitted and observed empirical entire distributions.

QunnaLi · September 23, 2021, 12:47pm

Good suggestion. Thank you!

Uriah · August 7, 2023, 8:05am

Old question and updated answer:

For dealing with individual decision making decision curve should do the job:

I made a reproducible example with my r package {rtichoke} for interactive visualization:

https://rtichoke-blog.netlify.app/posts/2022-09-18-dca-for-quantifying-the-additional-benefit-of-a-new-marker-by-emily-vertosick-and-andrew-vickers/

R_cubed · August 7, 2023, 1:06pm

I’m not sure how much this helps for your problem, but there has been work on the computational aspects of Bayesian model selection that attempts to avoid the need for a likelihood function. One of the main researchers in this area is Christian P Robert, known for an excellent text The Bayesian Choice: Decision Theoretic Foundations to Computational Implementation.

For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed.

f2harrell · August 7, 2023, 1:09pm

Full likelihood methods have many advantages including provision of general R^2 measures and bridging Bayes and frequentism.