"Spline approximation error"

Elias_Eythorsson · June 10, 2023, 1:14pm

I recent paper of mine was rejected following statistical review. I am completely unable to parse what the criticism is or what it means and would love to hear your comments or links to teaching materials that explain the supposed error.

For several years now I have liberally used restricted cubic splines, following auditing the regression modelling strategies course by Frank Harrell Jr. A prospective study I was involved in was rejected because of something called “spline approximation error”. I am unable to find anything about this term, or related terms, in RMS or other statistical textbooks, nor through google searching. It should be noted that we did present the regression equation in the supplement, we used restricted cubic splines not B-splines (and this was very clear in the paper), and finally, p-values were not used for any inference in the paper, which makes the comment all the more frustrating. However, if this supposed error exists and I have committed it, it would presumably apply to the confidence intervals as well. The comment is presented below:

“The paper does not present the expression of the logistic model, so it is clear about how the true functional relationship between the log-odd of the outcome and age is specified. Based on the descriptions in the paper, it seems that a nonparametric relationship 𝜃(𝑎𝑔𝑒) is included in the logistic model, where the fully unknown function 𝜃() is approximated by B-splines with four knots. If the above understanding of the modeling is accurate, then statistical inference (and the calculation of p values) conducted in this paper is problematic. This is because in the calculation of p values the splines approximation error isn’t accounted for in the analysis, so the resulting inference is incorrect. This is the well-known fact in the statistical literature. The research team should consult a statistician with proper knowledge of nonparametric regression with the utility of splines.”

Elias_Eythorsson · June 10, 2023, 1:29pm

I wonder if the reviewer thinks that restricted cubic splines are nonparametric and is referring to something similar to the smoothing parameter of generalised additive models. I still don’t see how this would invalidate any inference, but at least the comment would make a bit more sense to me - it would then become an overfitting flavoured argument, which I still don’t think is a valid criticism.

f2harrell · June 10, 2023, 4:39pm

The reviewer is quite uninformed. I’ve never heard of this but guess that the reviewer is referring to the general phenomenon of lack of fit. If you had fitted a linear age effect the reviewer could have said the true fit is likely to be nonlinear. If you had fitted a quadratic effect in age she might have said that the polynomial should have been cubic, etc. Another thing the reviewer didn’t understand was the bias-variance tradeoff. We never fit in an unbiased way when the same size is not large, e.g., we don’t use 6 knots for a B/natural/restricted cubic spline with small n. Trying to get a perfect fit defeats the model by leading to unstable estimates with wide confidence intervals.

Elias_Eythorsson · June 10, 2023, 5:51pm

Thank you. The rejection letter referenced this particular statistical review as the grounds for rejection and I am going to contest it.

ICARRIERE · June 19, 2023, 8:06am

it seems that the reviewer thinks that the spline function is modeled alone first and then this function is injected into the multivariable logistic model. However, a single model estimates both the spline function and the covariate effects.