Hi everyone.

This feels like a basic question, but we have not found a clear answer.

We are studying the utility of using a novel predictor within a prediction model.

The baseline and augmented models are both “xgboost” models (gradient boosting with decision trees as base learners), so have no clear notion of likelihood.

We want to be able to say that the novel biomarker “improves” the prediction in a NHST framework

The standard “machine learning” way is to compare AUROC values.

We are familiar with the work (I think by Margaret Pepe) that shows that the statistical test for comparing AUROC is just a less powered likelihood ratio test.

We are also familiar with Prof. Harrell’s approach that says that the LR test is the go-to test for these kinds of questions.

The problem is, again, that the model has no defined notion of likelihood.

What then is the preferred approach to perform such a hypothesis test?

Thanks in advance,

Noam Barda