Fixed or random time effect?

sebastian · July 24, 2023, 9:05am

Hi everyone,

I know there’s an extensive literature on this topic and also an ongoing discussion (e.g., http://www.stat.columbia.edu/~gelman/research/published/AOS259.pdf). However, I’m confused of whter I should model time as a random or fixed effects in a mixed model in my specific case.

Assume an RCT with two groups (intervention vs. control), 5 measurement points (Baseline, 3 weeks, 6 weeks, 3 months, 6 months), and I’m interested in testing the hypothesis of an effect of the intervention group. The dependent variable is a continuous score.

When visualizing my data, it is clear that the scores vary at baseline (intercept) and at time (slope), so e.g, some people start with significantly lower values of the score and show significantly other slopes (improving in score values with time) than others (whose score highly worsens with ongoing time). Plotting the data, I would assume that I should use a random intercept random slope model with time having a random effect.

However, depending on the 5 definitions Andrew Gelman (Gelman, 2004, Analysis of variance—why it is more important than ever) provides, I would agree or disagree to use time as a random effect in my model when testing the group effect:

E.g., taking statement #1 provided in the source above:

“Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts ai and fixed slope b corresponds to parallel lines for different individuals i, or the model yit=ai+bt. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients.”

I would model my time as random effect.

Taking statement #3

“Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth”

I would model time as fixed effect, especially given that it only has 5 levels.

Apart from that, in each case I would model my subjects as random effect (having a random intercept model). Could you shed some light on my specific issue?

Thanks a lot in advance.

f2harrell · July 24, 2023, 1:11pm

Time is always a fixed effect. I think you meant to say patients.

It would be worth exploring goodness of fit of random intercepts/random slopes models vs. simpler models that flexibly model the correlation structure and have no patient-specific parameters.

Jochen · July 24, 2023, 8:29pm

Why is time always fixed? In the growth study example from Anrew Gelman mentioned by sebastian above the random slope would be a random time effect, wouldn’t it?
But in sebastian’s experimental design there does not seem a a functional model of the time-couse (like exponential growth or similar), plus he mentioned the variability between patients at baseline, so I think your remark that he might have confused patient and time is on the point.

f2harrell · July 24, 2023, 10:09pm

It’s a bit just semantics. What is randomly varying is patient, not time.

Jochen · July 25, 2023, 5:39am

Having a simple linear relationship (for instance) you can have a random intercept and/or a random slope. I’d understand that the differences between patients sebastian mentioned refers to a random intercept (patient). But It is also possible to model the slopes (time) as random term. Am I confused here?

f2harrell · July 25, 2023, 11:37am

I’d like to find a definitive reference. I guess you can say that if time interacts with patient that time is random. But not necessarily.

sscogges · July 25, 2023, 8:05pm

I believe Frank is using the term “random effect” to refer to the grouping variable across which coefficients will vary. In that sense, there is only one random effect in this problem, namely patient. Having settled on patient as the random effect, you can then make different modeling choices about which coefficients in the model will vary across patients: intercepts and/or slopes for different predictors. I believe these random coefficients are what Jochen and sebastian have in mind when they talk about random effects. Confusion sets in when some people are talking about random effects in terms of the grouping variables and others are talking about random effects in terms of the specific random coefficients present in the model.

Confusion about this distinction is understandable, since even the definitions in the Gelman article seem to conflate the two. For instance, the quoted statement 1 is clearly talking about random coefficients. However, in my opinion, statement 3 makes the most sense when thinking about random effects in terms of the grouping variable. From that perspective, statement 3 does not read like an argument against including random slopes for time, since we’re most likely interested in the distribution of the random slopes as opposed to the slopes for the specific people we happened to sample.

f2harrell · July 26, 2023, 11:08am

Very well said. On a related note I would like to be pointed to a reference that shows which correlation pattern is induced by a random slopes/random intercepts model.

sscogges · July 26, 2023, 4:52pm

Jon Wakefield’s “Bayesian and Frequentist Regression Methods” book has an expression for it in section 8.4.2. If you don’t have it handy I can type it up later.

f2harrell · July 27, 2023, 12:14pm

I’d appreciate that.

sebastian · July 28, 2023, 6:39am

Thanks a lot for your answers. Thanks especially to you sscogges, your answer resolves a lot of my confusion (probably that of some other readers, too). I guess the differentiation between random effects and random coefficients is part of the widespread confusion around this topic (as seen also in the statements above).

A further question would be: Would the answer of Frank still apply to random coefficients in my model? Fitting slope and intercepts coefficients depending on plots and then do some model comparison?

sscogges · July 30, 2023, 4:07am

Correlation pattern for random intercept/random slope model given by Wakefield: Suppose we have i individuals measured n_i times, with the j^{th} measurement time for person i given by t_{ij} and the outcome for that measurement time given by Y_{ij}. Wakefield considers the random intercept/random slope model given by

\begin{align} Y_{ij} &= \beta_0 + b_{i0} + \beta_1 t_{ij} + b_{i1} t_{ij} + \epsilon_{ij} \end{align}

with Var(b_{i0}) = \sigma_{0}^2, Var(b_{i1}) = \sigma_{1}^2, Cov(b_{i0}, b_{i1}).= \sigma_{01}, and Var(\epsilon_{ij}) = \sigma_{\epsilon}^2.

Wakefield gives the marginal correlation for observations at times t_{ij}, t_{ik} as

\begin{align} \rho_{jk} &= \frac{\sigma_0^2 + (t_{ij} + t_{ik}) \sigma_{01} + t_{ij} t_{ik} \sigma_1^2}{(\sigma_{\epsilon}^2 + \sigma_0^2 + 2t_{ij} \sigma_{01} + t_{ij}^2 \sigma_1^2)^{1/2} (\sigma_{\epsilon}^2 + \sigma_0^2 + 2 t_{ik} \sigma_{01} + t_{ik}^2 \sigma_1^2)^{1/2}} \end{align}

sebastian: I think Frank’s suggestion was to fit two types of models that account for the repeated observations in different ways. The first model would be a model with random intercepts and random slopes for the time covariate, where the intercepts and slopes vary across patients. The second model would be a model that doesn’t include any patient-specific parameters but instead accounts for repeated measures by allowing for correlated errors. This approach is sometimes referred to as “mixed models for repeated measures” (again, quite confusing terminology!). See this blog post for a discussion: Mixed model repeated measures (MMRM) in Stata, SAS and R – The Stats Geek and this recently released R package for options for fitting this type of model: CRAN - Package mmrm

f2harrell · July 30, 2023, 12:23pm

This is very helpful @sscogges. Looking at the above correlation formula it appears at first glance not to be consistent with the kind of exponential decline we see as a function of time gap, e.g,. AR1 structure. So I see some downsides for random slopes models:

Random intercepts models already have a lot of parameters, sometimes leading to convergence problems and instabilities
Random slopes doubles the number of papers, creating more numerical issues and longer computation time
The assumed correlation pattern may not match what you see in the empirical semivariogram
My colleague Jonathan Schildcrout has said that random slopes can induce heteroscedasticity, i.e., you need to relax the assumption of constant \sigma^{2}_\epsilon to reflect that variances increase at extreme t

If you model the correlation pattern directly (using generalized least squares or in general, Markov models), the random effects may shrink enough so as to be ignorable. Superimposing a serial correlation model in addition to random intercepts/slopes, if you think you need random intercepts/slopes, will speed up convergence because the effective number of parameters shrinks.