Calculate linear predictor for a prediction model using restricted cubic splines

bsurial · December 10, 2020, 8:34pm

Hi all.
I have a question regarding the use of restricted cubic splines. Let’s assume I would like to externally validate a model that was published in the literature but I have no access to the data. To do so, I would have to calculate the linear predictor using the published regression coefficients. Prof. Harrell avidly advocates for using restricted cubic splines (rcs) when for continuous covariates. However, how would I calculate the linear predictor for a model which used rcs, for instance a model fitted like this:

cph(S ~ rcs(age, 5) + gender, data = df)

I reckon that the location of the knots is rarely reported and depends on the distribution of the derivation data.

I would highly appreciate any guidance on that.

f2harrell · December 10, 2020, 8:40pm

If using R Markdown you can do

f <- cph(...)
latex(f)

and put results='asis' in the chunk header, to produce LaTeX markup that is automatically translated to math notation and put into the report. That will represent the splines in simplest forms, showing also the knot locations. Alternatively you can use Function(f) to display the fitted equation in R notation.

bsurial · December 10, 2020, 9:12pm

I guess I was not clear enough in my question. Assuming I don’t have the data but it was fitted by another research group using rcs. How can I externally validate their model using my data? As e.g. argued by Royston and Altman this involves calculating a linear predictor using the published coefficients, no?

f2harrell · December 10, 2020, 10:56pm

Sorry I missed that. You would have to know the knot locations they used and any scale factor used in computing the spline basis functions. Barring that, you can’t get predicted values.

bsurial · December 11, 2020, 10:48am

Thanks for those clarifications, that is really helpful. My follow-up question then would be: how do I need to report the results of a model which uses restricted cubic splines to allow for others to validate such a model with their own data?

To give a concrete example: I am planning to estimate a risk prediction model based on the strategy from your text and the RMS short course (which are excellent resources by the way). Therefore I would want to model continuous variables using restricted cubic spilnes. How should I report the model to allow others to externally validate it using their own data? When others would want to follow the process outlined by Royston and Altman linked above, they would need predictions for the individuals in their dataset.

Given a hypothetical model created from data of the survival package:

library(tidyverse)
library(rms)
df <- survival::pbc %>% 
  as_tibble() %>% 
  # Only allow one event state, not 2
  mutate(event = as.numeric(status == 2))

dd <- datadist(df); options(datadist = "dd")

S <- Surv(time = df$time, event = df$event)

a <- cph(S ~ rcs(age, 5) + sex, df)

print(a) would give me this:

print(a)

# Output:
Cox Proportional Hazards Model
 
 cph(formula = S ~ rcs(age, 5) + sex, data = df)
 
                        Model Tests    Discrimination    
                                              Indexes    
 Obs       418    LR chi2     28.20    R2       0.066    
 Events    161    d.f.            5    Dxy      0.225    
 Center 2.8692    Pr(> chi2) 0.0000    g        0.506    
                  Score chi2  29.13    gr       1.658    
                  Pr(> chi2) 0.0000                      
 
        Coef    S.E.   Wald Z Pr(>|Z|)
 age     0.0654 0.0597  1.10  0.2729  
 age'   -0.0017 0.2675 -0.01  0.9950  
 age''  -0.3289 1.1103 -0.30  0.7671  
 age'''  0.9991 1.7020  0.59  0.5572  
 sex=f  -0.2310 0.2280 -1.01  0.3109

And Formula(a) would give me this:

Function(a)

# Output:
function(age = 51.000684,sex = "f") {-2.8692322+0.065446684* age-1.4362402e-06*pmax(age-33.839014,0)^3-0.00028313565*pmax(age-43.981314,0)^3+0.00086010403*pmax(age-51.000684,0)^3-0.00069649416*pmax(age-56.828131,0)^3+0.00012096202*pmax(age-67.920876,0)^3-0.23102847*(sex=="f") }
<environment: 0x00000293349704b8>

And latex(a) would give me this:

Wouldn’t it be necessary to report this equation somewhere in the paper too? I had a look at an article that you co-authored and couldn’t find a formula or the regression coefficients that would allow me to calculate predictions. I did see that you provide a shiny app which I am planning to do too, but another researcher would still need the formula to calculate predictions, right? Or is there another way to externally validate your model? Or would it be more appropriate to refit the model using the same specification as mentioned in the paper?

Sorry, but I am only starting to get the full picture here.

Thanks in advance for any precious inputs!

f2harrell · December 11, 2020, 3:32pm

We have often put the latex(fit) output in the appendix of a medical journal article. That allows anyone to program it and reproduce our predictions. Online supplements are alternatives. For R users you could give them also the fit object which can be run against newdata if coded the same as your data.