This is the seventh of several several connected topics organized around chapters in Regression Modeling Strategies. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.
What do you suggest for panel data? (longitudinal data with time discrete, as often occurs in patients’ registry). No difference. Can use continuous time methods for discrete time, perhaps only changing to an unstructured covariance matrix if there are 2 or 3 distinct times.
Longitudinal data includes 2 time points? 2 or more
What if patients were seen at different time-points (clinical routine)?How does it impact time-effect? Continuous-time models such as AR1 handle this nicely.
How do you decide whether a generalized least square or a mixed effect model fits better your data? Start with the variogram. For a standard mixed-effects models the assumed variogram is flat.
What is your preferred way to calculate R^2 and/or assess explained variance between fixed and random effects for mixed effect models? Haven’t studied that
I’m trying to simulate some serial data to show the increased Type 1 error rate of analyzing it as if it were independent data.
For instance, I’m comparing the frequency of p<0.05 using OLS vs GLS when there’s no real association between X and Y. The relevant code is as follows:
I’ll bet that the normal approximation for \hat{\beta} is not great for N=10. The p-value would probably be good were a t-distribution to be used instead of Gaussian.
Thanks.
Do you mean that the fit of the GLS model is suboptimal in terms of achieving normally distributed residuals? Or, alternatively, that the data generating mechanism should be based on sampling X and Y from a t rather than a Gaussian distribution?
Apologies if I’m not following…
Neither. Take the special case of GLS with one time point. Then GLS=OLS and we use the slightly wide t-distribution for getting p-values instead of the normal distribution.
We can select a t-distribution approximation for computing p-values and confidence limits of regression coefficients:
parameters(f, ci = 0.95, method = 'residual')
Is that what you mean?
Unfortunately it does not seem to work for Gls models.
To solve my problem from the message above, I simply increased the sample size from 10 to 30 and now the frequency of p<0.05 is 5.5% with Gls, which seems correct.
The parameters idea is restrictive. A lot of hypotheses are on linear combinations of parameters and involve > 1 degree of freedom. Hence the contrast function in rms. It does’t provide t-based p-values for gls though.