This excellent paper by German et al. presents a computationally and statistically efficient way to jointly analyze the mean and the variance as a function of covariates. Their method is named WiSER for within-subject variance estimator by robust regression. The authors have two excellent applications: glucose variability in diabetic patients (with data collected 4x per year) and activity level variation (with data collected every 5 minutes). The need for efficient methods, especially with dense time series data per patient, is clear. WiSER applies to continuous response variables.
Most of the complexity of analyzing dense within-patient serial data comes from modeling the covariance structure of the multivariate data. Markov models do this by modeling conditional distributions. For example, a second-order Markov model conditions at each time t on measurements made at times t-1 and t-2. These previous measurements are treated no differently than baseline covariates. Once analysis is complete, one uses the model’s transition probability estimates to obtain unconditional estimates. The process of unconditioning on previous times involves computations that have nothing to do with the sample size, so they are feasible for all sample sizes. With the Markov model’s conditional independence assumption, the calculations ignore patient identifiers once the lagged measurements are computed within-patient. So the calculations are extremely fast because they are the same calculations on n patients each having m serial measurements as one would do on n \times m patients each having one measurement. In addition, Bayesian MCMC converges very quickly in these conditional independence models.
This leads to a question. Instead of having specialized models needing specialized software for jointly modeling the mean and the variability, why not use a Markov model to model variability separately? Wouldn’t this also be a little more interpretable? Variability might be captured by |Y(t) - Y(t-1)| conditional on baseline covariates and conditional on |Y(t-1) - Y(t-2)|, and Y(t-1), the latter variable probably being unnecessary if Y is transformed so that successive differences in transformed Y are independent of absolute Y. For example, Y(t-1) will be needed if Y should have been log-transformed but wasn’t, i.e., if differences are relative rather than incremental. If Y(t-1) is not needed we have a simple first-order Markov process in the 1-lag absolute differences.
The Markov approach would solve another problem that was not satisfactorily dealt with in the German et al article: in the accelerometer dataset, many 5-minute periods had patient activity levels of zero, so the distribution of Y has “clumping at zero”. A semiparametric Markov model can handle this elegantly. Semiparametric Markov longitudinal models are discussed in detail here.
I would be interested in others’ thoughts about the potential for the simpler Markov approach.
It’s an interesting idea to use the Markov process to model the correlation between longitudinal measurements. Is this easy to incorporate the effects of the time-varying covariates on within-subject variance in a Markov model? For example, taking a medication or not (bad medication adherence) can have a large effect on the within-subject variance of blood pressure.
BTW, we also did an analysis using WiSER of the effects of Trump’s tweets (frequencies of certain words) on the volatility of SP500 stocks. Unfortunately, it’s removed due to the journal page limit, but it’s in Chris’s dissertation.
I’m inclined to place problems like this in the general frame of state-space modeling [1,2], and then attempt to describe the various other approaches as special cases. For example, do the stochastic volatility models mentioned in [2] subsume the other approaches being discussed here?
The way computational efficiency has motivated WiSER is interesting. I would tend to appeal to Gustafson’s Law in approaching these massive patient registries with their distinct, non-interacting individuals. I would estimate n independent state-space models to obtain patient-level parameter vector estimates which could then be subject to a further stages of analysis such as visualization and regression. The problem of “interpretability” ideally is solved by bespoke model construction ex ante, rather than by grafting an ‘interpretation’ onto a generic model post hoc. Under such a philosophy, one would confront the zero-activity periods ‘head-on’ with a latent state variable having associated Markov transition probabilities (interpretation: sedentariness), which has the effect of bringing a crucial scientific question into clear view rather than sweeping it under the rug of a generic modeling approach.
Kantas N, Doucet A, Singh SS, Maciejowski J, Chopin N. On Particle Methods for Parameter Estimation in State-Space Models. Statist Sci. 2015;30(3):328-351. doi:10.1214/14-STS511
Interesting take David. I think the approach you’ve outlined could lead to better understanding. The idea of pooling separate per-patient analysis may run into problems of instability and dependence on having similar observation periods per subject.
Great question Hua and I’m so glad to have one of the paper’s authors respond. In general, Markov models for the mean process are really good at handling time-dependent covariates, with the usual caveats about internal vs. external time dep. covariate interpretation (an example of an external covariate is a crossover study with a pre-ordained crossover time). I would think that Markov models should also have success in modeling variability, if simple absolute differences capture variability (and perhaps one could also analyze running variances this way?).
I think that the main challenge is to get the lag periods right. But this is also an opportunity. One could model the effect of |Y(t)-Y(t-1)| as a function of most recent medication status and how long it has been from the time the medication was actually taken. The latter could be used to model a decay effect whereby variability returns linearly or quadratically as time progresses.
Dear prof, could you please elaborate on this part or point to some resources to learn from? I can’t really grasp that point, forgive the ignorance of a medical professional
Thanks for asking, because I didn’t explain that well. I go into this in detail here. To summarize that, if change over time is of interest, we need the difference in measurements to be able to stand on their own, i.e., for post - pre to be unrelated to pre measurements. post - pre will depend on pre if, for example, one should have taken logs but didn’t. In that case post - pre will increase with pre. The reverse can also happen, i.e., one takes logs and shouldn’t have, making post - pre decline with pre. So either the measurements need to be carefully transformed to make pre irrelevant, or pre needs to be added to the model.