P for non-linearity for restricted cubic spline model

Sanmei · July 2, 2020, 6:49am

Dear all, I have a problem about P for non-linearity for restricted cubic spline model.
we assessed the association between serum biomarker levels and the 10-year risk of events.
We found a significant association by categorizing the exposure into quartiles (same results when using teritles or quintiles)
A reviewer was interested in the shape of the association and asked us to do a spline analysis becoz the HR in the highest quartile was very high.

So we used the SAS ’effect’ statement (proc phreg ) to do a restricted cubic spline analysis. The plot was like this
Figure 1|222x500
But we can’t find a way to calculate the P-value for non-linearity. Do you have any suggestions for me?

Some paper published on BMJ wrote like “We tested for potential non-linearity by using a likelihood ratio test comparing the model with only a linear term against the model with linear and cubic spline terms”.
so it seems like P_value=1-probchi(LRT,df); (LRT is the difference of the likelihood-ratio statistic between two models: -2Log L )
I am not sure if this is right.

And I tried this method. However, when I put both the original exposure variable (continous, almost normal distributition) and the new spline exposure variable (from the effect statement) into the same model (multivariable-adjusted), there was a parameter estimate for the spline term, but not for the original variable term.

So may I just compare the log-likelihood of the model containing the linear term with that of the model containing cubic spline terms?
My understanding is that this likelihood ratio test is testing whether the model with the spline term has the same goodness of fit with that of the model with the linear term. In this case, the model with the spline term had a better goodness of fit, is it proper to say there is a nonlinear association.

f2harrell · July 2, 2020, 12:19pm

Yes the gold standard frequentist test in this situation is to compare 2 nested models: one with all terms (the complete set of spline basis functions including the linear term) and one with only the linear term.

Note that categorization of the exposure into quantile groups is not valid and needs to be removed from the manuscript. That analysis is asking the meaningless question of whether the marker is associated with the outcome according to how many subjects have similar values of the marker. Instead base the analysis on biology instead of demographics by doing only the continuous variable analysis.

Sanmei · July 7, 2020, 6:16am

Dear Prof. Harrell, thank you very much for your reply!
We finished this analysis using the likelihood ratio test.

I have read your BBR RMS textbook, and I totally agree with the problem you emphasized about the categorization of the exposure.
Actually, we also showed the HRs for per 1SD increment in the marker concentrations in the manuscript. But, I am also concerned that this analysis may tell the readers that there is a linear association between the exposure and the outcome, which is actually inconsistent with the shape we observed in the spline plot.

f2harrell · July 9, 2020, 11:11am

Any attempt to use a single number as an effect measure is problematic unless you knew the relationship is linear, and you will often make readers question why the P-value implied by one approach is much different from the overall chunk likelihood ratio test. So I would give the graph as the estimate of effect. If you are certain the relationship is monotonic you can use the inter-quartile-range hazard ratio to supplement the graph. I don’t like using SDs in general.