When people compare classifiers on a common testing dataset, the most common approach I see used to compare them is linear regression using % accuracy.

However, this feels wasteful to me, as it boils down a whole dataset into a single value - and ignores the fact the testing dataset could be very large, and provide high precision.

To give an example, I have a common testing dataset of 5,000 radiology images, and I want to compare *families* of classifiers (neural networks). Within each family of neural networks (e.g. EfficientNet, ResNet) there will be models of increasing numbers of parameters (e.g. EfficientNetB0, EfficientNetB1 … EfficientNetB7).

My initial thought was instead of regressing accuracy as linear regression, I could instead do:

```
model <- glmer(correct_prediction ~ model_size + (1 | model_family/specific_model) + (1 | image_id),
data = data, family = binomial)
```

But I’ve not seen this approach before.

Also, as it’s a random effect, rather than a fixed effect I don’t think it would allow me to compare individual families of networks?

Instead, could I do something like this:

```
model <- glmer(correct_prediction ~ model_family + model_size + (1 | model_family/specific_model) + (1 | image_id),
data = data, family = binomial)
```

Where I include `model_family`

as a fixed effect as well as a random effect.

Finally, I’m almost certain model_size will have a non-linear relationship (moving from small to medium models will offer big boosts, but medium to large less so), so I was thinking about modelling this using a restricted cubic spline, like so:

```
library(rms)
library(lme4)
library(lmerTest)
dat <- datadist(data)
options(datadist = 'dat')
model <- glmer(correct_prediction ~ rcs(model_size, 3) + model_family + (1 | model_family/specific_model) + (1 | image_id),
data = data, family = binomial)
```

Does this seem a sensible approach? As I’ve said, I’ve not really seen people using logistic regression for test scores, and yet to me it feels quite a logical (sorry) approach? And by adding `image_id`

as a fixed effect it’s also allowing me to do a paired analysis for each question, effectively.