If we have seasonality in data (e.g. some daily measurement, where we assume things might be different depending on outside temperature, amount of sunshine, or pollen in the air etc.), we probably want to assume this varies approximately in the same way in different years. Especially so, if we have not measured the actual influence factor (like temperature etc.).

For simplicity, let’s say all my data is from the same location (no problem with hemisphere and latitude - any extra thoughts on that are of course also interesting). Then I can use e.g. day of the year and put it into my regression model e.g. in a spline or in a discretized version (e.g. factor level for each week or month = piecewise constant function - presumably inefficient).

I thought the most obvious thing might be to put x_{cr}=cos(r * day/365.25 * 2 * \pi) and x_{sr}=sin(r * day/365.25 * 2 * \pi) for r=1, \ldots, R into the model (i.e. a Fourier series basis that for sufficiently large R should approximate just about any periodic function pretty well).

Is that the most obvious way to do this, or am I overlooking something obvious here? Are there some simple and good alternatives? Anything that you’d expect to perform better on finite sample sizes/match any true underlying function better with fewer terms?

Nice question. There are many good options available using periodic splines. Not as good (because it doesn’t make the shape just before the “start over” point the same as just after) as some of the approaches you’ll find there, but a simple approach is to force the function to start over at a certain date each year. Example in R:

require(rms)
dd <- datadist(mydata); options(datadist='dd')
# Assumes time is year + fraction of a year and start over point is mid year
g <- function(x) x - .5 - floor(x - .5)
f <- ols(y ~ rcs(g(x), 6), data=mydata) # 6 default knots

Use this only an example to get going, as this simple approach has a discontinuity at mid year.

It’s a good idea to add long-term trends in addition to cyclical trends. This can be done by adding another spline function (with fewer knots). I need to figure out how to do this with the rms package.

Franks’ suggestion (cyclic splines + long-term trend) can be nicely realized with gam of the mgcv package. You might find the long stream of blog posts from Gavin Simpson on this topic (starting here) interesting.

In some business contexts, seasonality has two or more peaks of different size that are not evenly spaced. Wouldn’t cyclic splines have difficulty modeling this sort of seasonality, and a non-cyclic-spline would be better?