How to compare and evaluate models for a new feature?

Hello Everyone,

I am working on a binary classification where I have ~5k records with Label 1 being ~1.5k records and Label 0 being 3.5k records.

When I tried multiple models based on 6,7 and 8 features, I see the below results. Based on the newly added 7th or (7th & 8th) feature, I see improvement only in one of the models (LR scikit shows improvement for 7th feature only whereas Xgboost shows improvement when 7th and 8th features are added).

I came across McNemar's Test but guess that may not be applicable here. I wish to know whether adding this feature really helps. Because I see performance improvement only in one model but not in other models.

Does this mean that new feature is helping us improve the performance? But it decreases the performance in other models? How should I interpret this?

Please note that I split the data into train and test and did 10 fold CV on train data.

So, how do I know that this newly added features are really helping in improving the model performance? Is there any statistic to find this?

Can help me with this?

Since this question is not directly health related and is quite general I suggest you post it on stats.stackexchange.com instead.

Note that casting this as a classification problem rather than a probability estimation (prediction) problem is likely to be extremely problematic as described here, and neither AUC nor F1 are proper accuracy scoring rules, as detailed here.

When you re-post it state clearly how the records were sampled/selected, as this is all-important.