Interpreting the random-effect solution in a mixed model

I am using a generalized mixed linear model to account for repeated measurement (of counts of things representing performance in multiple matches of different sports teams, but that is not particularly relevant). The random-effect solution for the identity of the subjects (the teams) represents relative magnitudes of the subject means of the dependent variable. The covariance parameter for subject identity is provided as a variance, and I have always interpreted the square root of that covparm as the between-subject standard deviation, after adjustment for everything else in the model. Residual error has been partitioned out of it, so I call it the true between-subject SD, as opposed to the observed between-subject SD, which includes residual error.

Fine (or at least I hope so), but here’s my problem. The SD of the random-effect solution should give another estimate of the pure between-subject SD. I realize it isn’t right to start with, because of a degrees-of-freedom issue: if I have n subjects, the SD of the random-effect solution is calculated as if there are n-1 degrees of freedom, but the SD provided by the variance for the subjects is calculated, or at least should be consistent with, the actual degrees of freedom for the variance. An estimate of the degrees of freedom is given by 2*Z^2, where Z is the variance divided by its standard error. I am using Proc GLIMMIX in SAS, by the way, which provides a standard error for the random effects. Well, I’ve done the calculation to correct the SD of the random-effect solution using the degrees of freedom, and the correspondence is not exact, but it’s near enough, so let’s assume that the mismatch between the SDs is just a degrees-of-freedom issue. With these data the degrees of freedom of the subject variance are small–I am getting values of around 1 or even less, sometimes–so the SD of the random-effect solution is a lot less, by a factor of ~10, than the SD given by the square root of the covparm variance. So according to the random-effect solution, there are small differences between teams, but according to the SD of the covparm variance, the differences between teams are 10x greater. I actually want to use the random-effect solution to assess individual teams, but I am reluctant to, because things don’t add up. Should I do some kind of correction on the values of the random-effect solution?

Immediately after I posted the above, I realized I should have added another related question. SAS allows negative variance, and it provides a solution even when the variance is negative. I checked that the solution is included in the linear model to give predicted values, the same as when the variance is positive. But now how do I interpret the random-effect solution? It’s like there are negative real differences between the subjects.


1 Like

a fellow sas user, we are a bit rare on here. Not sure what you mean when you say sas allows negative variance. In the ‘bounds’ statement in nlmixed i would have var>0. if there’s some issue with convergernce then they sometimes suggest a different parameterisation, maybe this is what youve seen? Maybe this helps clarify some things: The median hazard ratio: a useful measure of variance and general contextual effects in multilevel survival analysis with discussion of the GCE?

My question about negative variance is not a convergence or parameterization issue. In Proc Mixed and Proc Glimmix you can state “nobound” to allow negative variance. (Proc Nlmixed doesn’t allow it, by the look of the documentation.) Nobound increases the risk of failure to converge, but I can usually get around that by stating initial values of the covparms and/or by relaxing the convergence criteria, then relaxing them even further to check that there is no substantial change in the estimates. Negative variance is pretty-much essential when you have a random effect representing individual responses, because it’s the only way to get sensible compatibility (confidence) intervals. And it’s pretty obvious that sampling variation can result in negative variance, when the sample size and/or true variance is small. I’ve used simulation to check that the intervals include true values at the chosen level of the interval.

but it’s spurious and definitely not “essential”. if the var is known to be close to 0 then lose the rand effect