Pre-specifying the distribution of AFT models according to AIC/BIC criteria and other exceptions to the use of observed effects

albertoca · January 9, 2020, 8:53pm

Hi, I have a question about the pre-specification of multivariable models. One of the things that datamethods insist on most is that observed effects should not be used to specify multivariable models. However, I believe that Professor Harrell sometimes supports some exceptions, for example, in the selection of the number of knots for restricted cubic splines. I wanted to ask what other exceptions to this rule exist, and the justifications for them.
For example, I recently submitted a paper with a lognormal AFT model and the reviewer asked me to evaluate the fit for all distributions using the AIC and BIC criteria (the log-logistc was the best). I have seen that some statistical guidelines recommend doing this, but in that case, I could not pre-specify the distribution of the AFT model in the study protocol. What do you think about this point? Thank you and greetings.

f2harrell · January 9, 2020, 11:20pm

I think it’s a difficult question and I would love to hear answers from others. The only thing I can contribute right now is these two thoughts:

A Bayesian approach would be the most honest. It would involve having parameters that relax model assumptions, and incorporating the proper uncertainty in those parameters.
If you use AIC (don’t usually recommend BIC) in selecting from 3 or fewer models, then less harm is done to final inferences.

albertoca · January 23, 2020, 6:38pm

I agree to use a Bayesian AFT model.
In fact, after reading you, I am convinced that bayesian statistics are more rational and intuitive.
However, I still don’t understand what you mean by “having parameters that relax model assumptions, and incorporating the proper uncertainty in those parameters”.
For example, if I were to use a lognormal AFT model, do you mean use appropriate priors for scale and shape parameters? In that case I don’t understand the rationale. Please explain intuitively if you have some time.
Regarding the other recommendation to use the AIC, I understand that you consider correct to pre-specify in the protocol that I will use the AIC rule to finally apply one of three possible distributions (for example, log-normal, Weibull and log-logistic). Correct?

f2harrell · January 24, 2020, 3:32am

I mean to add a 3rd parameter to the model that generalizes Weibull to something more flexible, with a prior that tilts the analysis towards Weibull. An analogy is the non-normality parameter in the Bayesian 2-sample t-test; see Chapter 5 of BBR.