Specifying random effects structure to achieve proper degrees of freedom

Dear all,

I’m currently struggling to specify the correct random effect structure to analyze the data of a virtual reality experiment we conducted with a patient and a healthy control group. I’ve read (and learned) a lot in the last months, but with my psychology background I have the feeling that I lack some of the background knowledge to apply this to my specific data structure. Therefore I would be very grateful for every hint, feedback or literature recommendation.

Shortly to our paradigm:

The participants were asked to read out the same nine questions (asking for advice or support) in the same order consecutively to eight different virtual characters (first, all questions are asked to the first virtual character, then to the second, etc., 72 trials in total). The answers of the virtual characters differed with regard to social acceptance and rejection and whether the virtual character explained their response or not. Therefore, the answers (= stimuli) can be characterized on the two factors 1) reaction (rejection/acceptance) and 2) explanation (no/yes).

After listening to each answer, participants were asked to assess the avatar’s benevolence towards the participant by adjusting a slider on a scale (our dependent variable). The slider started in the middle of the scale at the beginning of each of the eight conversations and did not jump back between trials, but remained at the height set by the participants.

Among the virtual characters, the number of answers that were rejecting or accepting as well as with and without explanations was balanced. The frequencies, combined occurrences, and sequence of all four types of answers were also evenly distributed across all virtual characters. The response pattern assigned to each character remained consistent across participants, assigning a distinct ‘personality’ to each virtual character. The presentation order of the eight virtual characters was randomized.

In short, we are interested in whether and how the groups differ in their ratings dependent on the experimental factors, so my fixed effects look like this: rating ~ group * reaction * explanation

According to the recommendations of Barr, I included a by-subject and a by-stimuli random intercept, to account for the crossed repeated measures structure (multiple observations per subject due to multiple stimuli, multiple observations per stimulus due to multiple subjects). From there I added by-subject random slopes for the experimental factors as well as their interaction and by-stimuli random slope for group to account for pseudoreplication:

(1 + reaction * explanation | subject) + (1+group|stimuli)

In addition to my interest in general feedback and suggestions, I still have two unresolved questions:

  • I later reduced the by-subject random slopes of this model until it converged, but was hesitant to omit the group slope: without the by-stimuli random slope for group, the degrees of freedom for the interaction terms including group increased suspiciously to >4440 what is close to our number of data points. I therefore assumed that the dependencies in our data was not correctly accounted for and especially after reading Arnqvist2020 and Scnadola&Tidoni` feel like it would not be a good idea to omit the group random slope. But colleagues argued this might not be a problem, since one of the advantages of mixed model is that they analyze all data points instead of mean values and the random slope for group seemed wrong to them). Does anyone has an idea or suggestion where I can learn more about df’s in mixed models and whether exploding df’s are suspicious or fine?
  • Since our virtual characters all had their specific answer pattern, the slider did not jump back to the middle after each rating and also the specific nature of the nine questions asked might have an influence on the ratings, we wanted to add the factors ‘character’ and ‘question’ to the random effects structure. I thought about nesting the stimuli in the characters and to add another slope for the questions, since the questions were the same for all characters, but the answers (=stimuli) differed dependent on the character. But each answer (=stimuli) is already connected to a unique combination of character and question, therefore the information seems to be redundant what might explain the singular fits). Does anyone has an idea or suggestion where I can learn more about how to properly account for design factors that might interfere with the experimental factors?

Thank you very much in advance, especially for reading this rather long post! :heart:


Great questions, and I hope that experts in multilevel modeling will respond.

As somewhat an aside I sometimes have problems with traditional analysis that lesson when using a Bayesian random effects model. And as a general observation random slopes may have too many effective d.f. But my experience is with serial dependence in longitudinal data where we have wonderful non-random-effects solutions.

1 Like

With regard to the question on degrees of freedom, Julian Faraway has argued that these are not well defined in mixed effects models:

The concept of ``degrees of freedom’’, as used in statistics, is not as well defined as many people believe. Perhaps one might think of it as the effective number of independent observations on which an estimate or test is based. Often, this is just the sample size minus the number of free parameters. However, this notion becomes more difficult when considering the dependent and hierarchical data found in mixed effects models. There is no simple way in which the degrees of freedom can be counted.

(see Introduction on the above-linked page, which is a post-publication comment on the first edition of his book, Extending the Linear Model with R. I do not know if he addresses this further in the second edition.)

1 Like

Thank you very much @ChristopherTong , I’ll read the comment and also found the book in the library of my University!

It would be nice if a simple formula exists that can roughly estimate the effective degrees of freedom in a random intercepts model. This would be some function of the variance of the random effects and the residual variance.


In addition to the Faraway reference that @ChristopherTong noted, Doug Bates, who was one of the primary authors behind the original nlme package in R, and the later lme4 package, which notably extended the older package to include generalized mixed models, had a lengthy post on the R-Help e-mail list back in 2006:

[R] lmer, p-values and all that

where he discusses why lmer/glmer and family do not display p values by default, and raises the whole notion of degrees of freedom in mixed models.

That being said, there are extensions to the lme4 package which provide such outputs, along with additional packages like emmeans, that offer a variety of contrast options for post hoc testing.

I would also point @anna_mannheim to the GLMM FAQ that is maintained by Ben Bolker, who has essentially taken over the maintenance of the lme4 package from Doug Bates:


There are FAQs there that cover issues such as random versus fixed effects, crossed effects and many related topics that are likely to be of value in providing a high level perspective on some of the issues being raised here.

An additional resource, which is noted in the FAQ above, and at least within the R community, tends to be the home of the mixed model experts, is the R sig-mixed-models e-mail list:

R sig-mixed-models

Ben Bolker and others tend to respond fairly quickly there. @anna_mannheim if you post there, I would invite you to post back here with relevant information so that the thread here is of value for others moving forward.


Super helpful. The FAQ has this particularly helpful take:

1 Like

Very interesting experiment. I can’t help with your question as it’s far outside of my field as a patient advocate.

So if I have it right, the primary endpoint is to compare the felt degree of benevolence towards the participant based on how the avatars answered the participant’s questions. (My expectation as a former consent reviewer is that a consent wasn’t required - but would be a good process to go through because writing a consent can often help to better shape an experiment.)

A question on study methods: Was the appearance and voice characteristics of the virtual characters first rated for benevolence in some way?

Thank you so much for the helpful links @MSchwartz !
I spend some more days reading, posted my question to the mail list and will post helpful answers here!

1 Like

In the literature and a pilot study we found that the differences in pitch and speed etc of human voices was to distracting (up to the point that a very nice pronounced rejection was experienced as more pleasant than a somewhat harshly emphasized acceptance) - so we decided to implement AI generated voices. Like the appearance of the avatars, we now hope that the mixed models can help us to detect any strong influences, as unfortunately neither has been rated beforehand.

1 Like