Dear @f2harrell,
Iām hoping you could help clear up a bit of my confusion about one of the methods of assigning d.f. to predictors thatās mentioned in Ch. 4, specifically within the context of hypothesis testing.
I understand the use of plot(anova(fit))
to guide assignment of d.f., with the total available d.f. being based on one of the rules of thumb (e.g., available d.f. = n/15). And I understand using AIC to guide knot selection, assuming that within each model tested, you use the same number of knots for each continuous predictor (e.g., mod1: all continuous vars are linear; mod2: all continuous vars have 3 knots; ā¦ modX: all continuous vars have X knots).
Itās my understanding that both those methods (assignment of n/15 d.f. according to plot(anova(fit))
, and using AIC) are acceptable within the context of hypothesis testing, and no adjustment to alpha, e.g., based on models submitted, is necessary if you use either of those methods.
The section in Ch. 4 about rules of thumb refers to a third method of assigning d.f.: the use of van Houwelingen & le Cessieās shrinkage factor heuristic, which is explained in detail later in Ch. 4. I understand that method to be:
- Use rule of thumb to determine max available d.f.
- Assign d.f. according to
plot(anova(fit))
, theory, and prior research
- Fit this full model
- Get LR chi2 of full model and calculate
(LR - d.f.)/LR
- If result from that calculation is much less than .9, calculate
(LR - d.f.)/9
to get a new total d.f. that is more appropriate to your data
- Assign d.f. as in step 2. This represents your final model
What Iām confused by is the next statement in this section, which if Iām reading it correctly, seems to say that if youāre doing hypothesis testing, you should be fitting the full model (decided upon in step 2, above), and that you shouldnāt be fitting a model thatās based on the method described in steps 4-6 above.
Am I reading that right? Or is the shrinkage factor heuristic method fine to use with hypothesis testing? I ask because some models Iāve fit using the rule of thumb method have shown relatively poor shrinkage factor heuristics (e.g., around .7), despite using less d.f. than what the n/15 rule of thumb has suggested is OK. Iāve taken this to mean my data have a lower signal:noise ratio than the data that the rules of thumb are based on, and I wanted to make sure I wasnāt jamming too many d.f. into an analysis than my data could support.
P.S. Many thanks go to you (and the other members of this community) for all the resources you devote to spreading statistical know-how. Iāve shared some of the techniques you discuss in the book/on hbiostat/here (particularly the use of restricted cubic splines) with some of my colleagues, and they have been genuinely amazed (as I am) at how easy it is to model complex relationships with these techniques.