Dear Colleagues:
Interested in your thoughts on the # of knots to select on an RCS model. In this body of work, we are using RCS to fit a concentration curve between 2 biomarkers and calculate a ‘cut-off/threshold’ of our x-variable/biomarker.
Five (5) knots have been okay for our analyses thus far and our sample sizes have been pretty robust (often >200 observations), but we keep getting questioned why 5 knots, and not 4, or 8 etc.
Below are 2 fits and the “threshold” concentration for serum ferritin against hemoglobin. Our model is also guided by physiologic interpretations, so we are interested in the concentration at the first inflection plateau. We achieve different SF concentration depending on the knots selected, and we don’t want the whole process to be data driven.
Do you happen to have any thoughts on how to select on the best number of knots for a dataset?
Thank you for your time and insights.
Ps. RCS fit for US women data, and physiologic ferritin ‘thresholds’ derived from Function (RCS.fit), c(5,50).
@f2harrell has some suggestions on selection of knot number/position in his RMS course book (section 2.4.6) and his Regression Modelling Strategies textbook (don’t have a link cause I think that’s not allowed, but if you don’t have the book a quick search would find some publicly accessible versions). Briefly, place knots based on previous knowledge or specific points/quantiles and you could compare maximum likelihood/AIC to give an estimate of which number of knots gives the best model fit. I know this is a data-oriented approach you hope to avoid, but I’m not sure if there is a (patho)physiological reason to choose one over the other.
From a more physiological point of view, the difference between a threshold value of 25.16 or 35.94 is also probably quite negligible as both are quite close together in the accepted lower normal range of ferritin. I was therefore wondering if you could elaborate on the precise goal of your analyses? What is the value in identifying the concentration around the first inflection?
Thanks scboone. The value of the concentration at the first inflection/goal, it to determine a clinical meaning of onset of iron deficient erythropoiesis based on hematology indicators we are using. So, the 25.16 vs 35.94 is rather huge and not negligible, to patient, clinician and public health screening. Hope this clarifies things further.