Dear Prof. Harrell,
Thanks so much for the “rms” package, it is simply fantastic (and I wished I had used it before)! However, there is something I can’t get my head around, and it seems this topic has created some confusion among others, too (e.g. on stackexchange). I use a dummy example (code below) to illustrate. The example has only 3 variables: event (0/1), time (time-to-event), riskfactor (in {1, 2, 3, 4, 5}) as explanatory covariate. Note: for simplicity, I treat riskfactor as continuous (despite it taking only 5 values) so I can fit restricted cubic splines. I set the reference value arbitrarily at 4.12.
Here are some initial considerations before getting to my question.
Logistic model: fitlrm_rms ← lrm(event ~ riskfactor, data = d)
a. Plotting Predict(fitlrm_rms, …, ref.zero = F) shows the linear predictor, i.e., alpha + beta*riskfactor, which is the log-odds.
b. Plotting Predict(fitlrm, …, ref.zero = T) shows beta (log odds ratio) only, yet it will adjust it to the chosen reference value, such that the confidence interval collapses there.
Note: at least in the rms version I use, the y axis label for b. above states log odds, instead of log odds ratio. Look like a labelling mistake?
Cox model without restricted cubic splines: cphout ← cph(Surv(time, event) ~ riskfactor, data = d)
The model is h(t) = h_0(t)exp(betariskfactor).
a. Plotting Predict(cphout, …, ref.zero = F) shows beta (log hazard ratio), with reference value = mean(riskfactor). The CI collapses (zero width) there. The y label states “log Relative Hazard”, i.e. log(HR) [HR = Hazard Ratio]
b. Plotting Predict(cphout, …, ref.zero = T) shows the beta, yet it will adjust it to the chosen reference value (4.12), and the CI collapses there (zero width).
My question:
Cox model with restricted cubic splines: cphspl ← cph(Surv(time, event) ~ rcs(riskfactor, 3), data = d)
The model is h(t) = h_0(t)*exp(f(riskfactor)), where f is a cubic spline. Let me reverse the order of a. and b. here since I think I know what b. shows.
b. Plotting Predict(cphout, …, ref.zero = T) shows f(riskfactor), yet it will adjust it to the chosen reference value (4.12) and the CI collapses there (zero width). The y label states “log Relative Hazard”, i.e. log(HR).
a. Plotting Predict(cphout, …, ref.zero = F). I am not clear what this shows? The label also states “log Relative Hazard”, i.e., again a log(HR)? I have read from other posts here that you refer to this as “linear predictor”. This makes me think it should be f(riskfactor), yet with a different (which default?) reference value. What surprises me though is that the CI doesn’t collapse anywhere. I couldn’t get my head around this since there will be some riskfactor_0 such that f(riskfactor_0) = 0. Wouldn’t that imply then log(h(t;riskfactor_0)) = log(h0(t)) + log(1) = log(h0(t)); and that shouldn’t have any uncertainty around it? Heinzl & Kaider who developed SAS code for restricted cubic splines years ago also refer to this fact (no uncertainty around, see Section 3.1 in Heinzl H, Kaider A. Gaining more flexibility in Cox proportional hazards regression models with cubic spline functions. Comput Methods Programs Biomed. 1997 Nov;54(3):201-8. doi: 10.1016/s0169-2607(97)00043-6. PMID: 9421665.).
I truly appreciate your help to resolve my confusion. Thanks a lot already; I posted the code below.
Simon
# -------------------------------------------------------
# load libraries, define auxiliary function
# -------------------------------------------------------
library(rms)
library(gridExtra)
# ----------------------------------------------------------------------------------------
# PART I - logistic model
# ----------------------------------------------------------------------------------------
# -------------------------------------------------------
# generate data
# set seed; define sample size;
# generate a prognostic risk factor (handled as linear
# despite having only 5 categories, this is for spline)
# simulate, event, time, create a data frame (name: d)
# -------------------------------------------------------
set.seed(1)
n <- 1000
riskfactor <- sample(c(1, 2, 3, 4, 5), size = n, replace = T, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
event <- rep(c(0, 1, 1, 1), each = n/4)
time <- sapply(riskfactor,
FUN = function(x){
rexp(1, 0.2*(x == 1) + 0.4*(x == 2) + 0.8*(x == 3) + 1.2*(x == 4) + 1.8*(x == 5))})
d <- data.frame(event = event,
riskfactor = riskfactor,
time = time)
# -------------------------------------------------------
# preparation for using rms package
# -------------------------------------------------------
drms <- datadist(d)
options(datadist='drms')
refval <- 4.12
predrf <- seq(1, 5, by = 0.05)
drms$limits["Adjust to","riskfactor"] <- refval # this is an arbitrary value, just to illustrate
# -------------------------------------------------------
# show fitted/predicted cruve for both, log-odds and
# log odds ratio
# -------------------------------------------------------
# log-odds
fitlrm_rms <- lrm(event ~ riskfactor, data = d)
predlodds_rms <- Predict(fitlrm_rms, riskfactor = predrf, ref.zero = FALSE)
# log-OR
predlor_rms <- Predict(fitlrm_rms, riskfactor = predrf, ref.zero = TRUE)
# plots - note prlm2 y axis label seems wrong (log odds instead of log odds ratio)
plrm1 <- plot(predlodds_rms)
plrm2 <-plot(predlor_rms)
grid.arrange(plrm1, plrm2, nrow = 1)
# ----------------------------------------------------------------------------------------
# End of PART I - logistic model
# ----------------------------------------------------------------------------------------
# ----------------------------------------------------------------------------------------
# PART II - Cox model (withou / with restricted cubic splines)
# ----------------------------------------------------------------------------------------
# Cox model with coxph
# here, when predicting the linear term, the log(HR) is relative to the
# *mean* of riskfactor, if ref.zero = F
cphout <- cph(Surv(time, event) ~ riskfactor, data = d)
predcph <- Predict(cphout, ref.zero = F)
predcphref <- Predict(cphout, ref.zero = T)
# plots
pcph1 <- plot(predcph)
pcph2 <-plot(predcphref)
grid.arrange(pcph1, pcph2, nrow = 1)
# final step: a more flexible model (restricted cubic spline)
# note that the CI doesn't collapse when ref.zero = F
cphspl <- cph(Surv(time, event) ~ rcs(riskfactor, 3), data = d)
pcphrsc1 <- plot(Predict(cphspl, riskfactor = seq(3, 5, by = 0.01), ref.zero = F),
ylim = c(-0.8, 1))
pcphrsc2 <- plot(Predict(cphspl, riskfactor = seq(3, 5, by = 0.01), ref.zero = T),
ylim = c(-0.8, 1))
grid.arrange(pcphrsc1, pcphrsc2, nrow = 1)