Restricted cubic splines (RCS) and standard errors

MeaghanCuerden · January 28, 2021, 10:45pm

Hello, I have a question about the variability of RCS coefficient estimates.

I’m working on a simulation study that involves RCS terms. I find that the RCS term coefficient estimates are highly variable. Is there a gain in efficiency in using the rcs() function (rms package in R) versus hand coding the RCS terms?

When I compare Wald tests using the rcs() function to hand-coded RCS terms, the test statistics are equal. I also find that the coefficient estimates are the same whether the RCS terms are hand coded, or the rcs() function with the Function() call is used. I think this means that there isn’t a gain in efficiency from using the rcs() function, but I’m wondering if there is a way to fit the data so that the estimates are less variable.

Is the correlation between the independent variable (e.g. X1) and the RCS terms derived from that variable (e.g. pmax(X1-knot1,0)^3, etc.) causing the issue with standard errors? Or is something else causing this?

f2harrell · January 29, 2021, 12:31pm

This would be better for the existing topic RMS DIscussions. But briefly, it doesn’t matter whether you hand code or use rcs when you are using the same basis function for each. The truncated power basis I use creates collinear terms that make the standard errors of individual terms large. That is to be expected. Don’t look at them. Look at standard errors of predicted values or the whole estimated spline function put together. That’s where the basis quits mattering and high standard errors vanish.

MeaghanCuerden · January 30, 2021, 7:29pm

Thanks so much, Dr. Harrell, this is very helpful.

Pavel_Roshanov · February 3, 2021, 1:44am

Does this mean that the standard errors of the point estimates of, say, the OR at any specific value of the continuous variable modelled by the spline are expected to be large?

f2harrell · February 3, 2021, 2:33am

No, because you don’t compute an effect using only one of a series of connected terms. By the time you get the OR in the general way all is well. The general way is \exp(X_{1}\hat{\beta} - X_{2}\hat{\beta}) where in the covariate X vectors only the factor of current interest is being varied. The multiple terms that the predictor has in \beta all get evaluated for both covariate combinations 1 and 2, and linear algebra is used to get the variance of the difference on predicted logits. Don’t even look at coefficients of individual nonlinear basis terms.

Pavel_Roshanov · February 3, 2021, 4:11am

Thank you. Explains nicely why the variance inflation factor is off the charts but the relevant SEs are not impressive.

f2harrell · February 3, 2021, 12:50pm

Right — don’t look at variance inflation factors for individual terms that will later combine forces.