 # Calculate linear predictor for a prediction model using restricted cubic splines

Hi all.
I have a question regarding the use of restricted cubic splines. Let’s assume I would like to externally validate a model that was published in the literature but I have no access to the data. To do so, I would have to calculate the linear predictor using the published regression coefficients. Prof. Harrell avidly advocates for using restricted cubic splines (rcs) when for continuous covariates. However, how would I calculate the linear predictor for a model which used rcs, for instance a model fitted like this:

``````cph(S ~ rcs(age, 5) + gender, data = df)
``````

I reckon that the location of the knots is rarely reported and depends on the distribution of the derivation data.

I would highly appreciate any guidance on that.

If using R Markdown you can do

``````f <- cph(...)
latex(f)
``````

and put `results='asis'` in the chunk header, to produce LaTeX markup that is automatically translated to math notation and put into the report. That will represent the splines in simplest forms, showing also the knot locations. Alternatively you can use `Function(f)` to display the fitted equation in R notation.

I guess I was not clear enough in my question. Assuming I don’t have the data but it was fitted by another research group using rcs. How can I externally validate their model using my data? As e.g. argued by Royston and Altman this involves calculating a linear predictor using the published coefficients, no?

Sorry I missed that. You would have to know the knot locations they used and any scale factor used in computing the spline basis functions. Barring that, you can’t get predicted values.

Thanks for those clarifications, that is really helpful. My follow-up question then would be: how do I need to report the results of a model which uses restricted cubic splines to allow for others to validate such a model with their own data?

To give a concrete example: I am planning to estimate a risk prediction model based on the strategy from your text and the RMS short course (which are excellent resources by the way). Therefore I would want to model continuous variables using restricted cubic spilnes. How should I report the model to allow others to externally validate it using their own data? When others would want to follow the process outlined by Royston and Altman linked above, they would need predictions for the individuals in their dataset.

Given a hypothetical model created from data of the `survival` package:

``````library(tidyverse)
library(rms)
df <- survival::pbc %>%
as_tibble() %>%
# Only allow one event state, not 2
mutate(event = as.numeric(status == 2))

S <- Surv(time = df\$time, event = df\$event)

a <- cph(S ~ rcs(age, 5) + sex, df)
``````

`print(a)` would give me this:

``````print(a)

# Output:
Cox Proportional Hazards Model

cph(formula = S ~ rcs(age, 5) + sex, data = df)

Model Tests    Discrimination
Indexes
Obs       418    LR chi2     28.20    R2       0.066
Events    161    d.f.            5    Dxy      0.225
Center 2.8692    Pr(> chi2) 0.0000    g        0.506
Score chi2  29.13    gr       1.658
Pr(> chi2) 0.0000

Coef    S.E.   Wald Z Pr(>|Z|)
age     0.0654 0.0597  1.10  0.2729
age'   -0.0017 0.2675 -0.01  0.9950
age''  -0.3289 1.1103 -0.30  0.7671
age'''  0.9991 1.7020  0.59  0.5572
sex=f  -0.2310 0.2280 -1.01  0.3109
``````

And `Formula(a)` would give me this:

``````Function(a)

# Output:
function(age = 51.000684,sex = "f") {-2.8692322+0.065446684* age-1.4362402e-06*pmax(age-33.839014,0)^3-0.00028313565*pmax(age-43.981314,0)^3+0.00086010403*pmax(age-51.000684,0)^3-0.00069649416*pmax(age-56.828131,0)^3+0.00012096202*pmax(age-67.920876,0)^3-0.23102847*(sex=="f") }
<environment: 0x00000293349704b8>
``````

And `latex(a)` would give me this:

Wouldn’t it be necessary to report this equation somewhere in the paper too? I had a look at an article that you co-authored and couldn’t find a formula or the regression coefficients that would allow me to calculate predictions. I did see that you provide a shiny app which I am planning to do too, but another researcher would still need the formula to calculate predictions, right? Or is there another way to externally validate your model? Or would it be more appropriate to refit the model using the same specification as mentioned in the paper?

Sorry, but I am only starting to get the full picture here.

Thanks in advance for any precious inputs!

We have often put the `latex(fit)` output in the appendix of a medical journal article. That allows anyone to program it and reproduce our predictions. Online supplements are alternatives. For R users you could give them also the fit object which can be run against `newdata` if coded the same as your data.