Pre-specifying the distribution of AFT models according to AIC/BIC criteria and other exceptions to the use of observed effects

Hi, I have a question about the pre-specification of multivariable models. One of the things that datamethods insist on most is that observed effects should not be used to specify multivariable models. However, I believe that Professor Harrell sometimes supports some exceptions, for example, in the selection of the number of knots for restricted cubic splines. I wanted to ask what other exceptions to this rule exist, and the justifications for them.
For example, I recently submitted a paper with a lognormal AFT model and the reviewer asked me to evaluate the fit for all distributions using the AIC and BIC criteria (the log-logistc was the best). I have seen that some statistical guidelines recommend doing this, but in that case, I could not pre-specify the distribution of the AFT model in the study protocol. What do you think about this point? Thank you and greetings.

2 Likes

I think it’s a difficult question and I would love to hear answers from others. The only thing I can contribute right now is these two thoughts:

  • A Bayesian approach would be the most honest. It would involve having parameters that relax model assumptions, and incorporating the proper uncertainty in those parameters.
  • If you use AIC (don’t usually recommend BIC) in selecting from 3 or fewer models, then less harm is done to final inferences.
1 Like

I agree to use a Bayesian AFT model.
In fact, after reading you, I am convinced that bayesian statistics are more rational and intuitive.
However, I still don’t understand what you mean by “having parameters that relax model assumptions, and incorporating the proper uncertainty in those parameters”.
For example, if I were to use a lognormal AFT model, do you mean use appropriate priors for scale and shape parameters? In that case I don’t understand the rationale. Please explain intuitively if you have some time.
Regarding the other recommendation to use the AIC, I understand that you consider correct to pre-specify in the protocol that I will use the AIC rule to finally apply one of three possible distributions (for example, log-normal, Weibull and log-logistic). Correct?

I mean to add a 3rd parameter to the model that generalizes Weibull to something more flexible, with a prior that tilts the analysis towards Weibull. An analogy is the non-normality parameter in the Bayesian 2-sample t-test; see Chapter 5 of BBR.