Question on interpreting ZOIB models with random effects

Isabella_Ghement · June 27, 2022, 6:10pm

Hi everyone,

I’ve had no luck getting answers to my questions below after posting them on the ASA Connect forum of the American Statistical Association (ASA). In case someone here could help shed some light on my questions, I am posting them here.

Lately, I’ve been wrestling with some zero-and-one-inflated beta regression models which include random effects and am not sure whether my interpretation of the models is correct, so I am looking for some confirmation whether or not I am on the right track. These models are more complicated since they have 4 different components - mu (mean), phi (dispersion), zoi (zero and one inflation) and coi (conditional one inflation).

To keep things simple, let’s say that my first model, model1, is fitted like this using the brms package:

library(brms)
model1 <- brm(
               formula = bf( student_response ~ 1 + district_type + (1 | district/school),   # mu 
                                     phi ~ 1 + district_type + (1 | district/school), 
                                      zoi ~ 1 + district_type + (1 | district/school), 
                                      coi ~ 1 + + district_type + (1 | district/school)),
                family = zero_one_inflated_beta(), ... )

where we have students nested in schools nested in districts and the response variable is measured at the student level and is a proportion which can take any value between 0 and 1, including 0 and 1.

It is my understanding that the first model component, mu, models the logit-transformed mean value of the response variable as a function of the district-type predictor and the school within district and district random effects, but only for those students whose responses are NOT 0 or 1. The second model component, phi, models the variability in these response variable values about mu as a function of the same. Is my understanding correct?

What throws me off is the fact that district_type (let’s say, Large versus Small) is a district-level variable in this model, so interpreting its effect seems trickier. I can’t say things like “for the typical school nested inside the typical district”, because that would preclude district taking two possible values, Large or Small, at the same time.

But can I say something like: If we compare the logit-transformed mean response values for students in two districts such that one district is Large and the other is Small but which both have the same value for the random effect of District and contain schools with the same values for the random effect of school, then the difference in the logit-transformed mean response values for these students is captured by the slope of district_type in the mu component of the model? (None of the students in question provided response values of 0 or 1; only values in (0,1).)

I am just looking for an interpretation of the slope of district_type in the mu component that I can live with though this seems hard to pull off and translate in simple words. Maybe I can simplify this further to:

If we compare the logit-transformed mean response values for students in two middle-of-the-pack districts such that one district is Large and the other is Small and contain middle-of-the-pack schools, then the difference in the logit-transformed mean response values for these students is captured by the slope of district_type in the mu component of the model? (None of the students in question provided response values of 0 or 1; only values in (0,1).)

As an aside, if mu is the expected value of a “discrete” (rather than “continuous”) non-zero and non-one proportion, does it make sense to use terminology like “odds” when describing what the exponentiated value of the slope of district_type means in the mu component?

When combining the four model components (i.e., mu, phi, zoi and coi), one can get the expected value of the student response variable, regardless of whether that response was 0, 1 or something in-between. What is the best way to describe the meaning of that expected response value? Can we now say it represents the expected response value for students in a middle-of-the-pack school (i.e., a school with a random school effect equal to 0) located in a middle-of-the-pack district (i.e., a district with a random district effect equal to 0), regardless of whether their responses were equal to 0, 1 or something in-between?

My second ZOIB model looks something like this:

library(brms)
model2 <- brm(
               formula = bf( student_response ~ 1 + student_status + (1 | school),   # mu 
                                     phi ~ 1 + student_status + (1 | school), 
                                      zoi ~ 1,  
                                      coi ~ 1),
                family = zero_one_inflated_beta(), ... )

where now there are just students nested inside schools, student_status is a binary predictor (let’s say: good student vs problematic student) and there are twoo few 0’s and 1’s to go fancy with modelling the zoi and coi’s components.

So, for model2, can I interpret the effect of student_status for the mu component by comparing the “good students” with the “problematic students” in the middle-of-the-pack school (i.e., a school with a random school effect of 0) in terms of the logit-transformed value of their mean responses, assuming all of these responses where not 0 and not 1?

And can I talk about the expected value of the student-response variable for the students in middle-of-the pack school, regardless of whether their response was 0, 1 or something in-between 0 and 1?

My third model is a bit more complicated, as it now includes a smooth term of a school level variable (say, school revenue) for each level of the two levels of student_status inside its mu component:

library(brms)
model2 <- brm(
               formula = bf( student_response ~ 1 + student_status + s(school_revenue, by = student_status) + (1 | school),   # mu 
                                     phi ~ 1 + student_status + (1 | school), 
                                      zoi ~ 1,  
                                      coi ~ 1),
                family = zero_one_inflated_beta(), ... )

How on earth do I interpret the effect of student_status in the mu component now? I guess I can interpret this effect at different values of school_revenue? Something like:

Comparing “good students” with “problematic students” at a middle-of-the-pack school which has a particular school_revenue value, the effect of student_status is given by ____ (what?). (I am thinking I need to compute some kind of marginal effect of student_status for the mu component?)

Thank you in advance for any tidbits of insights you will be able to throw my way.

Isabella

mshapiro123 · July 8, 2022, 11:16pm

I’m curious about this. I guess you could try Gelman’s blog (https://statmodeling.stat.columbia.edu) I think someone there is the maintainer of brms.