Language for communicating frequentist results about treatment effects

Blockquote
I would never limit a vague term like “evidence” to model comparisons, especially when the bulk of Fisherian literature (including Fisher) refers to small p as evidence against its generative model - and it is evidence by any ordinary-language meaning of “evidence”, with no single comparator.

While I find Richard Royall’s arguments for likelihood pretty strong, the distinction between “surprise” and “evidence” is made by others who work with Fisher’s methods.

The term “evidence” seems overloaded when it refers to p-values and likelihoods. It leads to unproductive discussions on how p-values “overstate” the evidence. If a distinction is made between surprise and direct measures of evidence via likelihood or Bayes’ Factors, we can prevent the p-value from being blamed for a simple mistake in interpretation.

Blockquote
Finally, a point needing clarification: Why do you want to convert a P-value in the way you indicated?

Kulinskaya, Morgenthaler, and Staudte wrote a text on meta-analysis that emphasizes the transformation of the p-value or t-statistic to the probit scale via the relation \Phi(p), where \Phi is the standard normal cumulative distribution function. Also writing in the Fisher tradition, they make the distinction between p-values as measures of surprise, and “evidence”, which they measure in probits, connecting Fisher and the Neyman-Pearson school of thought.

On another thread, you posted a link to a video where you, Jim Berger, and Robert Matthews discussed p-values. I’ve found all of the suggestions – the -e \times p \times ln(p) bound, the Bayesian Rejection Ratio, the S-value, and the Reverse Bayes’ method, all valuable.

I’ve given that much thought on how to use these techniques when reading not only primary studies, but meta-analyses where aggregate effect sizes are naively computed. I posted a re-examination of a recent meta-analysis of predicting athletes at higher risk for ACL injury in this thread: