How to create the real splines values for the rcs and %ia% functions in the rms package?

Hi everyone!

I have a very basic question but I maybe you could help me.

Suppose you have two random variables

X1 <- rnorm(1000)
X2 <- rnorm(1000)

and I want to estimate the 3 knot restricted cubic spline for each one. Therefore, I put this code

r1 <- rcs(X1, 4) 
r2 <- rcs(X2, 4)

However, these functions returns 1000x4 matrices with some properties. However rcs should also return a 1000x1 vector with the final estimation given by the equation (2.24) from the Regression Modelling Strategies book.

My best guess is that I can recover the splines doing this:

cbind(1, r1) %*% attr(r1, "parms")
cbind(1, r2) %*% attr(r2, "parms")

However if I try to estimate the restricted interaction of r1 and r2 I don’t know how to retrieve the final vector of the spline, because the “parms” attribute indicates where is positioned X1 and X2, but not the coefficients.

head(r1 %ia% r2)
        X1 * X2  X1 * X2'   X1 * X2''  X1 * X2'''   X1' * X2   X1'' * X2
[1,] -0.012547279 -0.10601003 -5.263230e-03 -5.189006e-06  0.017786810  4.460467e-05
[2,] -0.073957398  0.01408416  0.000000e+00  0.000000e+00 -0.313116000 -7.556321e-03
[3,] -0.218514107  0.48459929  9.975158e-03  0.000000e+00 -0.354030733 -5.246242e-02
[4,]  0.422954923 -0.12262768 -7.746083e-06  0.000000e+00 -0.030438609  0.000000e+00
[5,]  0.003644586 -0.07949653 -2.991382e-03  0.000000e+00 -0.006242986 -2.130929e-05
[6,] -1.805530149 -3.47596715 -6.373479e-01 -1.349836e-01  0.000000000  0.000000e+00
[1] "X1" "X2"  
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    0    0    0    0    1    1
[2,]    0    0    1    1    1    0    0
[1] 9
[1] "X1 * X2"
[1] "X1 * X2"
[1] "X1 * X2"    "X1 * X2'"   "X1 * X2''"  "X1 * X2'''" "X1' * X2"   "X1'' * X2" 
[1] "rms"

Could anyone help me to estimate this vector?


You are mixing (1) formation of basis functions for curve fitting, which does not use Y in any way, with (2) estimating the effect of a predictor by using maximum likelihood estimation to estimate the coefficients of the basis functions using Y. Also note that rcs(x, 4) returns an n \times 3 matrix, not n \times 4, since a restricted cubic spline with k knots has k-2 nonlinear basis functions and one linear basis.

rcs doesn’t know about regression coefficients estimated after rcs is executed, so it can’t do what you want. Instead, get this from the model fit object. Then use the Function and latex functions to translate the fit into more familiar form, including re-writing the spline functions in unrestricted form with respect to the right tail, by adding 2 linearly dependent terms just to make the equation not involve differences in a bunch of cubes for each predictor. Example:

f <- ols(y ~ rcs(x1, 4) %ia% rcs(x2, 4))

To have the fitted equation actually typeset when you are running an Rmarkdown report in html or latex format, put results='asis' in the chunk header to keep the generated \LaTeX markup intact.

Thank you very much for your feedback Prof. Harrell. I didn’t know about the command Function, which translates into a more familiar form the formula in the model. This was very enlightening.

In the same spirit of this question, if I do this regression

f <- ols(rcs(X1, 4) %ia% rcs(X2, 4) ~ -1 + rcs(X1,4) + rcs(X2, 4) )

and I want to extract one vector with the true residual of the model f, which will the best strategy? This is because the resid function produces a matrix

      X1 * X2    X1 * X2'    X1 * X2''     X1' * X2     X1'' * X2
1  0.05210089  0.07427233  0.013780006  0.009039437  0.0006596134
2  0.01474783 -0.04850576 -0.005978875  0.256188596  0.0628311765
3 -0.27872090 -0.72667769 -0.119851600 -0.324542821 -0.0578287101
4  0.45080771  0.53430986  0.064896494  0.406889294  0.0517983166
5  0.08085727  0.08461046  0.013383061  0.065159819  0.0106388715
6 -1.93287285 -1.64958780 -0.180449107 -0.711519827 -0.0789000584

Thanks again.

It would be good to step back and go through the function documentation and my RMS case studies in detail. You are trying to fit a model with no dependent variable, and you are using -1 in the formula, which rms does not allow. resid is called after a successful fit. To just get a model design matrix use model.matrix(~ rcs(x1, 4) + rcs(x1, 3) +...).

Thank you very much Prof. Harrell. I understand now how it is structured the design matrix.