I’ve been using restricted cubic splines in my modeling work for quite a while now, and I usually follow Frank’s guidance on the number and placement of knots.
That said, I find things get trickier when splines are applied to time variables. In these cases, I tend to be more liberal with the number of knots, but I often catch myself second-guessing that choice.
Take the Sicily data example in sec 2.8 of Frank’s online book which uses spline terms in time:
library(rms)
getHdata(sicily)
d <- sicily
dd <- datadist(d); options(datadist='dd')
f <- Glm(aces ~ offset(log(stdpop)) + rcs(time, 6), data=d, family='poisson')
Suppose, we want to use AIC to guide the number of knots:
results <- data.frame(knots = integer(), AIC = numeric())
# Loop
for (k in 3:15) {
model <- Glm(aces ~ offset(log(stdpop)) + rcs(time, k), data = d, family = poisson)
results <- rbind(results, data.frame(knots = k, AIC = model$aic))
}
print(results)
print(results)
knots AIC
1 3 738.8582
2 4 721.0086
3 5 717.3928
4 6 721.5237
5 7 691.9370
6 8 682.5756
7 9 684.5746
8 10 682.2653
9 11 655.3435
10 12 652.2580
11 13 644.5240
12 14 636.9575
13 15 642.1793
In this case it takes 14 knots to minimize AIC. I come across some version of this issue almost every time I work with splines of time. So my question: besides sample size/EPV considerations, when using splines with time, are there any heuristics or guidance on how many knots to use?