Ah, something in my wheelhouse! From a Fitbit, ActiGraph, or other physical activity monitor you can usually get activity data at the minute-to-minute or even second-to-second level, either as step counts, activity counts, “MIMS units”, or “ENMOs”, so there’s essentially no limit to how finely-grained you can observe activity.
There are some biostats research groups that are approaching the problem more or less in line with your option #2: considering physical activity as a 24-hour function, modeling each day with a basis expansion. There’s a really nice in-depth review on these kinds of strategies in Erjia Cui’s dissertation defense here from this spring.
Some papers use functional principal component analysis on either the activity data or the log+1 transform of it, then get fPCA scores and use those in a regression model, possibly alongside other covariates.
It’s also possible to use the functions themselves as covariates in a regression model. The ‘refund’ package in R supports this kind of analysis, using a generalization of semiparametric mixed models. That allows lots of useful features, like random functional intercepts for multi-day observations on the same person, generalizing to nonlinear effects of the 24-hour function, and including other possibly nonlinear scalar covariates.
Depending on the study goals, you could also treat the 24hr activity pattern as the outcome, and look at the change in activity pattern as a function of age, or being on a certain medication, for example.
There are some interesting complicating factors with any approach:
- Should the 24hr day start at midnight for everyone, or should the daily observations be aligned somehow to account for, e.g. someone who gets out of bed at 5am vs. at 10am?
- Often physical activity monitors give you informative missingness, i.e. you know when the person isn’t wearing the device. How much missingness should you tolerate? Do you impute the missing data, or use an analysis method that allows the 24hr activity function be sparsely observed?
- If you have access to raw accelerometer data, it’s possible to recognize specific types of physical activity (driving, biking, walking, sleeping, etc.) with activity recognition algorithms, e.g. as done in this paper. How should the analysis change if you are interested in differential effects of different kinds of activity? For example, activity counts on a wrist-worn accelerometer would be pretty low when riding a bike, but if you knew that activity time was from a bike ride, it’d be nice to be able to account for it in your analysis. Sleeping is another good example: 5hrs of sleep + 3hrs of sedentary time with similarly-low activity counts is probably less healthy than 8hrs of sleep.
I think this topic is a very timely one, since there are many new large-scale studies like NHANES and UK Biobank that are incorporating physical activity data alongside outcomes like cognitive function, blood pressure, mortality, and pretty much anything else you might care about from a health perspective.