Hi, I have a question about the pre-specification of multivariable models. One of the things that datamethods insist on most is that observed effects should not be used to specify multivariable models. However, I believe that Professor Harrell sometimes supports some exceptions, for example, in the selection of the number of knots for restricted cubic splines. I wanted to ask what other exceptions to this rule exist, and the justifications for them.

For example, I recently submitted a paper with a lognormal AFT model and the reviewer asked me to evaluate the fit for all distributions using the AIC and BIC criteria (the log-logistc was the best). I have seen that some statistical guidelines recommend doing this, but in that case, I could not pre-specify the distribution of the AFT model in the study protocol. What do you think about this point? Thank you and greetings.

# Pre-specifying the distribution of AFT models according to AIC/BIC criteria and other exceptions to the use of observed effects

I think itâ€™s a difficult question and I would love to hear answers from others. The only thing I can contribute right now is these two thoughts:

- A Bayesian approach would be the most honest. It would involve having parameters that relax model assumptions, and incorporating the proper uncertainty in those parameters.
- If you use AIC (donâ€™t usually recommend BIC) in selecting from 3 or fewer models, then less harm is done to final inferences.

I agree to use a Bayesian AFT model.

In fact, after reading you, I am convinced that bayesian statistics are more rational and intuitive.

However, I still donâ€™t understand what you mean by â€śhaving parameters that relax model assumptions, and incorporating the proper uncertainty in those parametersâ€ť.

For example, if I were to use a lognormal AFT model, do you mean use appropriate priors for scale and shape parameters? In that case I donâ€™t understand the rationale. Please explain intuitively if you have some time.

Regarding the other recommendation to use the AIC, I understand that you consider correct to pre-specify in the protocol that I will use the AIC rule to finally apply one of three possible distributions (for example, log-normal, Weibull and log-logistic). Correct?

I mean to add a 3rd parameter to the model that generalizes Weibull to something more flexible, with a prior that tilts the analysis towards Weibull. An analogy is the non-normality parameter in the Bayesian 2-sample t-test; see Chapter 5 of BBR.