I cannot comment on the technical aspects of Senn’s criticisms of Bayes, but I assume similar critiques could be made about frequentist methods. What I worry about is how medical investigators and clinicians interpret statistical results. Clinicians think in terms of probabilities at wrongly assume that a P value represents the probability of some event happening. This situation is worsening as stats education in medical school is virtually nonexistent these days. Much harm has been done because of binary conclusions made because of P<.05. Clinicians frequently misunderstand what this really means.
Bayes for all its limitations yields a probability distribution that is consistent with how clinicians think about clinical problems.
Given the limitations of frequentist v Bayes, I think we are better off with a Bayes framework when trying to communicate with clinicians.
I’m not suggesting using AI to design trials or replace statisticians. I’m concerned from a journal perspective regarding what to do when a paper comes in with stats analyses we are not comfortable with. In general, I think frequentist methods have caused much harm and we would be better off with Bayes approaches. My initial question was if we encourage authors to apply Bayes methodologies, how would we do that?
Change will not happen overnight. I think it is reasonable to present both types of analyses until readers are familiar with and start using Bayes on their own.
Regarding clinical trials-They should be reported using the stats analysis described in the SAP. One can ask for an alternative stats analyses but only as a sensitivity analysis or for illustrative purposes. It would be wrong to ask an author to change the stats analyses proposed in a trial protocol or SAP.
Your comment “yes we can actually compute the probability that a treatment works; we don’t always have to compute the probability of getting impressive data if it doesn’t”. is very important. It is true that clinicians think in terms of probabilities as you stated. A major problem is that they are unfamiliar with the second part of what you wrote. Clinicians don’t know that the statistical analyses they read about usually are “comput[ing] the probability of getting impressive data if it doesn’t” They are unaware of that subtlety resulting in them concluding that if a P value if less than .05 the finding is true and if larger than .05 it is not true. Like it or not, this is how clinicians and many investigators interpret scientific findings.
This is one of the educational areas where we need to redouble our efforts and never give up. I start with 22 Controlling α vs. Probability of a Decision Error – Biostatistics for Biomedical Research in explaining that putting the word ‘error’ in the description of \alpha is an error, and I work hard to distinguish \alpha from a decision error. There are still better ways to say all this and I’m always looking for suggestion.
In terms of not requesting that a study change its analysis at the last minute, I have reviewed RCTs for medical journals where the RCT has a fatal design flaw. It’s up to top level journal to reject such submissions. I’ve also reviewed designs for journals devoted to publishing designs, where the design is seriously flawed and the editor said it’s too late to change. Why even have peer review of designs then?
Personally, my R code has no elegance so one can tell that I ran something through an LLM when it is well-organized and has a sense of style
This brings up the point that LLMs can be a useful cognitive extension. As long as authors disclose the use of LLM or other AI tools it is appropriate to take advantage of them. But then it does become the journal’s (along with the authors’) responsibility to verify correctness, reproducibility, and safety. We all can make mistakes with or without LLMs. Those disclosing the use of LLMs can name an author / analyst attesting they reviewed and understood the LLM-generated code and validated results.
A few red flags I have noticed to be typical of LLM-generated code include:
-Inconsistent variable names.
-Comments that don’t match the code.
-Variable misspecifications such as treating categorical variables as numeric or vice versa.
The literature and books at the beginning of the post are very useful. I would like to know if there is a summary post here that includes Bayesian related learning materials and literature (from beginner to advanced). If so, it would be very helpful.
Perhaps this might be true if the author also trained the model himself or herself. But the way these algorithms seem to be used only pushes the trust problem up a level to the LLM as a source.
IMHO, LLMs are going to create a crisis of nihilism in scientific activity that current institutions don’t seem to be prepared to deal with.