What are credible priors and what are skeptical priors?

Sander · November 23, 2019, 5:54pm

I should hope (perhaps overoptimistically) that my posts above show my views and answers clearly enough, but again: To some extent the MBx debate reflects the never-ending problems with stat testing, especially overinterpretation without regard to the full uncertainty in the situation or the costs and benefits of decisions under different scenarios.

To repeat what I said far above, I suspect the problem MBI attempted to address could have been solved more clearly and uncontroversially by graphing the upper and lower P-value functions with the interval boundaries marked, or by tabulating the upper and lower P-values at those boundaries (including the null), and then explaining the rationale for any decisions based on the resulting graph or table. In parallel one could graph or tabulate likelihood functions at the interval boundaries so one could obtain posterior odds based on prior odds, for use in decisions.

I think the whole of research statistics on pre-specified treatments or exposures should be upgraded by moving to continuous descriptive presentations, emphasizing and illustrating uncertainties and dangers of each decision before making any decision. That is how I have seen competent expert panels operate. But that requires a fundamental change in how basic statistics is taught and used, a change that has been promoted for decades yet is still not adopted as the standard.

For more of my views about statistics reform see my presentation at the NISS webinar, https://www.youtube.com/watch?time_continue=1962&v=_p0MRqSlYec&feature=emb_logo
with others at
https://www.niss.org/news/digging-deeper-radical-reasoned-p-value-alternatives-offered-experts-niss-webinar

Beyond those generalities, I am not a sports scientist nor do I have any stake either way in MBx past or future. So at this point I think we’re overdue to hear from others on this matter including Frank and also principals in the public MBI dispute (Sainani, Batterham, etc.) to whom I sent the present blog link.

f2harrell · November 23, 2019, 8:36pm

I’ll not add anything because you have covered this in my depth that I could. I appreciate all that you’ve written here.

Alan_Batterham · November 24, 2019, 3:09pm

Many thanks to Sander and all for the constructive comments and discussion on MBI. I have provided substantial feedback to those preparing the manuscript that Sander mentioned on situating MBI within an interval hypothesis testing framework, with new ‘decision labels’. Regarding previous studies featuring MBI, Sander made the following statement, which I welcome:

“An important point for going forward when looking back: Earlier applications of MBI did not necessarily reach incorrect conclusions. Instead when a report that used MBI is seen as supplying data relevant to a topic, that report needs to be scrutinized to see if its conclusions hold up or fail in light of critical insights about what MBI was actually producing (as opposed to the original MBI descriptions). Note that this advice involves study-specific examinations, and thus is a far cry from simply discarding everything that was published using MBI. Being able to follow it in a particular case hinges on how well the report described its design, execution and data, and if the needed summary statistics were given or computable from the report; if not, the report authors would need to supply those items.”

One of the examples used in the most recent critique of MBI was an intervention study with n=8 and no control arm. If the authors made definitive effectiveness/ efficacy conclusions from such a study, then irrespective of the statistical approach implemented this is obviously unwarranted. Regarding Sander’s advice for study-specific scrutiny, there’s no need to go any further than the design when examining such a study – definitive conclusions of effectiveness are unjustified. The design is simply deficient for evaluating effectiveness/ making definitive decisions and the sample size is even well short of a reasonable pilot trial, leave alone a ‘definitive’ study. In contrast, in previous studies using MBI that are well designed, executed, and reported, the conclusions may be examined for any hype based on inappropriate over-interpretation of ‘possibly’, ‘likely’ etc. All previous MBI studies should have reported the compatibility (confidence) interval for the effect (or it should be computable from results presented). One simple approach for study-specific examination is inspection of the point estimate and lower and upper limits of this interval in relation to the smallest effect size of interest. This examination permits both simple description and interval testing (at the level implied by the X% compatibility interval reported) to see ‘if the original conclusions hold up or fail’. If P values are reported then S (surprisal) values could also be easily derived. And, of course, ‘when a report that used MBI is seen as supplying data relevant to a topic’, if the relevant summary statistics can be extracted then these papers may also contribute to a meta-analysis. I mention the latter, because some commentators have gone as far as to say that all papers featuring MBI should be discarded/ regarded as ineligible for meta analyses. But, as Sander says, previous studies using MBI should be examined case-by-case for exaggerated conclusions, and their design, conduct, and reporting examined to assess quality/ risk of bias, as with any study, irrespective of the statistics employed. Indeed, a stronger focus on design, methods, and measurement rigour, as well as on statistics, would be welcome moving forward. Thanks again to everyone for their input - it is very much appreciated.

romain · November 24, 2019, 9:58pm

Thank you for your comments @Alan_Batterham would you agree with @Sander earlier comments about salvaging MBI, do you agree with the general consensus that it seems as though everyone acknowledges that MBI has numerous issues, and that MBD (while it is yet to be fully formed) after taking on feedback will move to be within an interval hypothesis testing framework.

Regarding previous studies maybe invalid is too strong, but if the statistical community consensus is that MBI as formed and used previously had issues, then surely studies that reached conclusions that only used MBI have issues. Not saying they are wrong but they should at the very least be revisited using standard accepted statistical methods?

Alan_Batterham · November 25, 2019, 2:03pm

Thanks @romain . Yes, I agree with Sander’s comments. As I mentioned, the paper being worked up currently situates MBI/MBD within a frequentist interval hypothesis testing framework, with new ‘decision labels’. So, it could be described as a nuanced fusion of equivalence testing and minimal effects testing. Sander has written that one needn’t be constrained by ‘either-or’ analyses and that an in-parallel presentation of frequentist interval testing plus Bayes or semi-Bayes is fine, providing known error control plus posterior probabilities, if desired/ required. In a couple of recent papers, for example, I’ve used a normal prior - to set reasonable bounds on the effect size and achieve the desired shrinkage - combined with the observed data via information-weighted averaging (semi-Bayes) or with a binary outcome using Stata’s penlogit program (augmentation method), both of which Sander has published on. As an aside, I realise that the semi-Bayes approach doesn’t set a prior for all free model parameters (hence ‘semi’ or ‘partial’ Bayes) - just a prior for the overall model effect size (e.g., adjusted treatment effect) but it is a logically consistent and robust method. Full Bayesian analyses are also appropriate, of course. Regarding your second point, I reiterate Sander’s comment that conclusions in previous studies presenting results using MBI are not necessarily flawed. In most cases, as I described, the necessary summaries should be in the Results (e.g. the compatibility – confidence – interval for the effect) to permit readers to make their own judgement. In other words, for the most part these studies do not need to be re-analysed to make a decision regarding their worth – and design and measurement issues are critical here, too, as they are for any study report.

romain · November 25, 2019, 8:29pm

Thank you for your comment @Alan_Batterham, I don’t understand however if you are changing what MBI is current (which you agree needs to be done) with @Sander help. Then why wouldn’t studies that used a method that you are no longer willing to stand by not need to be revisited? If the method used to analyse the data isn’t being stood by, then shouldn’t the studies be revisited with a framework that everyone is standing behind. I think revisited is key word here because likely they are asking interesting questions.

Maybe @f2harrell might know of examples but it seems a bit off to if a method is being moved away from(as everyone seems to be saying it not usable) to not revisit the results seems ridiculous.

I should clarify, in case I seem harsh. I think this is good for science and shows how science should work. Science is an updating process.

Alan_Batterham · November 25, 2019, 9:08pm

Thanks @romain - apologies if I’ve not made myself clear. I agreed fully with Sander’s point, when he said (bold added):

"Instead when a report that used MBI is seen as supplying data relevant to a topic, that report needs to be scrutinized to see if its conclusions hold up or fail … Note that this advice involves study-specific examinations …… Being able to follow it in a particular case hinges on how well the report described its design, execution and data, and if the needed summary statistics were given or computable from the report; if not, the report authors would need to supply those items.”

So, what I was trying to say is that the ‘needed summary statistics’ are typically in the report. If the confidence (compatibility) interval is there, for example, as it should be in most cases, then the reader has everything they need to judge the merits of the conclusions based on the analysis (as described in my first post). In addition, the design and measurement issues must be scrutinised, as for any study. So, I am not at all saying don’t revisit the studies - I’m agreeing with Sander’s suggestion for ‘study-specific examinations’ on a case-by-case basis, as “Earlier applications of MBI did not necessarily reach incorrect conclusions.” In my last post I was just saying that the raw data didn’t need to be re-analysed - the appropriate summaries will be there in most if not all cases. I used an example of an intervention study featuring MBI with n=8 and no control - if that study made definitive conclusions of effectiveness then that’s obviously wrong. But the conclusions from other studies featuring MBI (larger N, better designed and executed, for example) might be fine when one examines the compatibility interval presented. I hope I’ve clarified my point and sorry for confusing you or anyone else. In short, I was simply reinforcing Sander’s comments from my perspective.

romain · November 25, 2019, 9:23pm

thank you for the clarification @Alan_Batterham it helped a lot.

As someone who created MBI along wth Will and I assume involved in the move away from MBI towards MBD with Sander’s help. What would you suggest users of MBI do. Assuming what everyone is saying that MBI is no longer being stood by.

Secondly as I assume Sander will be a co-author on MBD, will this formulation be going into a statistics journal? I think this can only be a positive to have the community that was critical of MBI (as they should have been) accept MBD.

Thirdly being someone who tries to understand the methods I use before using them, will there be a mathematical formulation of MBD made public that wasn’t made with MBI (or if MBI maths is public feel free to correct me and link it).

Once again, I think this is great for sports science, while MBI was flawed, its great to see leaders in the statistics community like Sander Greenland help come up with methods that sports scientists can be confident has a solid statistical framework. Just wish it was available at the start of my career

Alan_Batterham · November 25, 2019, 10:40pm

Thanks @romain. No, Sander was very clear in an earlier message: “I am not a sports scientist nor do I have any stake either way in MBx past or future.” The lead authors of the paper that is being worked up are Janet Aisbett (a mathematician) and Daniel Lakens. As I stated in my first post, I have provided feedback. I am not sure which journal is being targeted but, yes, there is a mathematical formulation. As I mentioned, the lead authors are repositioning MBI within a frequentist interval testing framework, and it will be one option in the statistical toolbox, including simple descriptions of the compatibility interval (point estimate and lower and upper limits in relation to the smallest effect size of interest), Bayes and semi-Bayes, S-values, standard equivalence testing/ minimal effects testing, severe (severity) testing etc. etc. Thanks again for your input.

romain · November 26, 2019, 10:55am

thank you @Alan_Batterham, may I make a suggestion. It might be wise to take down the MBI sheets that exist on https://sportscience.sportsci.org/ (as it seems as though everyone agrees that it needs to be moved away from) and once the paper (with Aisbett and Lakens) is released to upload new spreadsheets with a “MBI that sits within a frequentist interval hypothesis testing framework”.

Once again, I hope I haven’t come across as too harsh. I hope that sports scientists see admitting faults and fixing them as a positive rather than a negative as all science should be viewed.

Alan_Batterham · November 26, 2019, 1:20pm

Thanks @romain, the spreadsheets were developed by and are owned/ maintained by Will Hopkins, so it is clearly not in my gift to take them down.

Will · November 28, 2019, 3:24pm

Here’s my spin on recent developments with MBI/MBD/MBx and my response about the spreadsheets at Sportscience.

In emails earlier this year Sander was very helpful in insisting that MBI needed a formulation in terms of hypothesis tests, and he suggested using Daniel Lackens’ approach. In June I contacted Daniel Lackens and Alan Batterham for advice. By July I had presented a seminar about it at the German Sport University in Cologne and written a first draft of a manuscript showing that MBI in its clinical and non-clinical versions is indeed equivalent to two one-sided interval hypothesis tests of substantial magnitudes with reasonable alphas. My manuscript was intended for publication in one of the sport science journals. Alan and I were then contacted out of the blue by Janet Aisbett, a retired maths prof., with the first draft of a manuscript also showing MBI’s equivalence to hypothesis tests. Janet’s manuscript is intended for a stats journal. Since then, and at Sander’s instigation, there have been regular email exchanges among a group of six–Sander, Janet, Daniel, Alan, Kristin Sainani, and me–in summary, to address the three points that Sander mentioned in this thread on Nov 21: the alpha levels one should use, the terminology for the multiple possible results, and how the results should be linked to Bayesian interpretations.

I have not been in a hurry to modify the spreadsheets at Sportscience, because the equivalence of MBI with hypothesis tests with reasonable alphas in my mind means that MBI is not fundamentally flawed, and therefore that no harm was done or is being done with the use of MBI, pending a rigorous frequentist formulation. Also, we have not finished the conversation on the alphas or the link to Bayesian interpretation. We have pretty-much finished the conversation on the terminology: a month ago I had updated the spreadsheet for converting p values to MBD, and I had written an update in the article accompanying the spreadsheet. Strangely, I got no feedback from the group about the spreadsheet or the update, so I will pursue this further before publishing it and moving on to the other spreadsheets. I think my spreadsheets and spreadsheets in general are useful for straightforward analyses, and I think they are also valuable teaching tools for research students. Black boxes they are not! Whether they are more or less susceptible to misuse than packages such as SPSS, Matlab, R, or SAS is an issue to be resolved preferably by evidence rather than opinion.

By the way, my own manuscript got an administrative rejection by the editor of Sports Medicine in August, but the rejection message went to a defunct email address, which I discovered only two weeks ago. I then approached International Journal of Sports Physiology and Performance about publishing it, since that journal had published the non-clinical MBI method in its first issue in 2006, but the editor was not interested. I am not sure what to do next about the manuscript. One possibility is to ask the members of our group to review it for Sportscience. Their input would give the article and the method trustworthiness. I think there is a place for an accessible plain-language description, or at least my version of it, since I think there is a place for the Bayesian interpretation (please don’t jump on me yet), whereas Janet and Daniel’s version is exclusively frequentist.

leonardof · November 28, 2019, 3:44pm

Sharing research data might obviate the need to provide frequentist estimates.

Sander · November 28, 2019, 4:22pm

“Sharing research data might obviate the need to provide frequentist estimates.”
I must be misunderstanding you, as I see sharing data oppositely: It reduces the need for informative-prior Bayes estimates, since key external information is then available in its actual data format instead of having been reduced (and likely distorted) into some probabilistic summary called a “prior distribution.”

That follows from a general view of information as a more intuitively helpful concept for statistical foundations than is probability, although the two concepts can be mapped into each other. In particular, “Bayesian statistics” and “frequentist statistics” are just two complementary branches of information statistics, ones in which the information formulas and summaries have been dominated by probabilistic representations.

Unfortunately, even though information representations were worked out in detail in the 1950s, they did not break out of the high-level literature - unsurprising given that basic stat analyses had already been built on formal probabilistic representations since the 18th century, whereas (as far as I’ve read) formal information representations did not even appear until the 20th.

leonardof · November 28, 2019, 6:26pm

First of all, sorry for resurrecting the thread. I mistook it for being from this year. Thanks for indulging me with a reply.

I agree using actual prior data is much preferable to making up prior distributions.

I was considering what one could to with one’s own data. If one makes the data available for future meta-analyses and other uses, maybe it is not so important anymore for one to provide frequentist estimates besides Bayesian (as in with priors) estimates.

Sander · November 28, 2019, 6:55pm

I’m glad you have resurrected the thread and clarified what you meant, as I think the topic remains timely as ever. I would just warn again against drawing sharp boundaries across statistical approaches as if they were “philosophies” (inferential religions). They are instead toolkits with their proper uses and improper misuses; hence the most skilled analyst understands them from a variety of positive and critical viewpoints.

romain · November 29, 2019, 12:29am

Hi @Will Thank you for your response on this forum, it’s very much appreciated.

Many thanks for the tools that you and @Alan_Batterham have provided.

I think there is confusion over essentially three ‘‘states’’ of MBI/MBD and perhaps you can help qualify.

What is MBI/MBD currently (as implemented in the spreadsheets) vs what is described as MBI/MBD online vs What will MBI/MBD be in the future.

From reading above, and the online commentary from people like Lakens

It looks as though MBI isn’t on formal footing (as described or implemented) and that work is being done to get it to a formalisation.

It seems a bit weird, if MBI (as implemented in the spreadsheets) and as described on your site is fine, that there would then need to be moved towards a formalised version with the help of Sander, Janet, Daniel and Kristin.

Do you agree with the critique of MBI by Kristin, Daniel online?

Will · November 30, 2019, 8:34pm

As I pointed out above, Romain, MBI has turned out to be equivalent to two one-sided interval hypothesis tests, with not unreasonable alphas, so MBI/MBD has a solid frequentist footing. The hypotheses for clinical MBI are that the effect has harm and benefit, or substantial positive and negative values for non-clinical MBI. Rejection of both of these hypotheses, which is what you want to do in the worst-case scenario of the true effect being trivial, sets the minimum desirable sample size. If only one of the hypotheses is rejected, the evidence for the other hypothesis, which in MBI is expressed in a Bayesian fashion, is also equivalent to testing an hypothesis (that the effect does not have that magnitude). MBI became MBD when Sander objected to the use of the word inference. I provide an argument for the Bayesian interpretation at the end of this message.

MBx is the version of MBI/MBD that Janet and Daniel are working on. They have used frequentist terms to describe the outcomes of the hypothesis tests, which I am happy to incorporate in frequentist versions of my spreadsheets. A point of contention at the moment is justification of sample size. They have tied sample-size estimation to an error rate or power defined by minimum-effects testing (MET), which involves an hypothesis about a non-substantial effect with a threshold for substantial that is somewhat larger than (e.g., 2x) the smallest important magnitude. I disagree with this approach, for several reasons. First, the somewhat larger magnitude is arbitrary, so it seems to me that it’s just shifting the goal-posts. Secondly, when a non-clinical true effect is the smallest important and the alpha for the test of the non-substantial hypothesis is 0.05 (substantial now referring to the smallest important, not 2x the smallest important), you will reject the hypothesis and thereby “discover” the effect only 5% of the time, whatever the sample size; that is, your power will be only 5%. (This low power does not apply to clinical MBI/MBD, since failure to reject the hypothesis of benefit is sufficient evidence to consider the effect as potentially implementable, if you want a reasonably low rate of failure to implement small beneficial effects.) Finally, the error rates in MBI/MBD are reasonable anyway, as Alan and I showed in our Sports Medicine paper. They have been misrepresented previously.

We have not yet had a decisive discussion of the Bayesian interpretation. I don’t think anyone apart from Daniel will object to use of a full informative-prior Bayesian analysis of the effect when one of the substantial hypotheses has not been rejected. That then raises an interesting question: can the posterior probabilities of magnitude be interpreted qualitatively, as I have done in MBI, with terms such as possibly, likely, and so on? The Intergovernmental Panel on Climate Change thinks it’s a good idea to use such terms to describe probabilities, and my scale is actually a bit more conservative than theirs. If we can use such terms, that raises a further crucial issue. Data from a study with a large-enough sample size overwhelm any informative prior, in which case the sampling distribution of the effect is the same as the posterior distribution of the true effect, and therefore the “inverse-probability” mantra should not be uttered to dismiss interpretation of the sampling distribution as the distribution of the true effect. Hence, if a researcher with only a small sample size opts for a realistic weakly informative prior that makes no practical difference to the posterior, the researcher should be entitled to use the terms possibly, likely, and so on. I have shown, using Sander’s semi-Bayes approach, that realistic weakly informative priors make no practical difference with the usual small sample sizes in sport and exercise science, so the original Bayesian interpretations of MBI hold up. That said, I do need to make researchers aware of the importance of checking that the weakly informative prior does indeed make no difference with their data before they use the probabilistic terms to describe the effect, and if the prior does make a difference, then they should use probabilistic terms provided by the posterior. My realistic weakly informative priors are normally distributed with 90% confidence or credibility intervals equal to thresholds for extremely large effects: 0.1 < hazard ratio < 10; -4.0 < Cohen’s d < 4.0; -0.9 < Pearson correlation < 0.9.

romain · December 1, 2019, 12:06am

Hi @Will would I be correct with the following interpretation of your response

1 - Your view is that MBI (as implemented and is available online currently and the formulation that was criticised by Kristin, Daniel etc is fine)
2 - The MBx you refer to with Janet and Daniel your point of contention what will this mean for implementation when MBx spreadsheets get released?
3- Why should researchers go towards MBx (the one with Janet and Daniel) if MBI (yours and Alans) that is implemented has been used for years is fine?

Will · December 2, 2019, 9:52pm

Hi again, Romain. Thanks for your persistence in asking those questions. In answering them, I have made several hopefully useful assertions. Note that I am @WillTheKiwi, but I seldom use twitter, since it seems to me to be a place where people too often voice their prejudices destructively.

I guess I have to say yes, MBI as currently implemented qualifies as another tool in the statistical toolbox. Sander and others would prefer to see frequentist terms describing the uncertainty in the magnitude, which I will provide. If there is a convincing argument for not using the Bayesian terms (e.g., see Point 3 below), then I will remove those terms.
As far as I can tell, the only remaining point of contention is sample size. I have no problem with anyone who wants to provide a reasonable rationale for a larger minimum desirable sample size. I’m not convinced that Janet and Daniel’s rationale is reasonable, because it involves detecting an effect that is arbitrarily larger than the smallest important, and I can see no justification for the arbitrariness. If others think it’s OK, then fine, put it in the toolbox, with one caution. Studies will continue to be performed with less than the desirable sample size, and some of those studies should be published. What is the criterion for publication in relation to precision of estimation? If Janet and Daniel’s version of MBx has an implicit stricter criterion than that of MBI/MBD, then there is the potential for publication bias: the smaller the relevant p value(s), the greater the bias arising from sampling variation. Adequate precision of estimation for publication of studies with sub-minimal sample size in MBI/MBD would not be associated with substantial publication bias. (BTW, I misrepresented the rationale for sample-size estimation in MBI/MBD in my previous message. The minimum desirable sample size is that which avoids failure to reject both hypotheses, since that means the effect “could be” beneficial and harmful, or substantially positive and negative, which in MBI/MBD is expressed as “unclear”.)
There are two reasons for researchers to opt for MBI/MBD rather than MBx, as currently proposed.

First, if the Bayesian terms hold up for MBI/MBD, it’s easier for people to understand the uncertainty. Sander has argued (and I hope I am not misrepresenting him–he’ll soon put me in my place otherwise) that “the data and the model are ambiguously compatible with benefit” or “the data and the model are weakly compatible with benefit”, and so on, are perhaps more cautious and more in keeping with Popperian philosophy than the Bayesian versions “the data and the model are consistent with possible benefit” or “the data and the model are consistent with likely benefit”, and so on. Researchers will probably shorten the frequentist statements to “the evidence for benefit is ambiguous”, “the evidence for benefit is weak” (unfortunately these sound similar), and so on, while the Bayesian terms will become simply “the effect is possibly beneficial”, 'the effect is likely beneficial", and so on. Obviously it is important for researchers to be up-front with the limitations of their data and model, whichever way they express the outcome.

Secondly, MBI/MBD has a smaller minimum desirable sample size than MBx. At this point I get accused of promoting small-sample studies, but I hasten to point out that studies with the kind of sample size estimated for MBI/MBD would be regarded as acceptable, if not large, in exercise and sport science, and probably in psychology and other empirical disciplines. For a simple comparison of means using standardization for the smallest effect (Cohen’s d = 0.20), the sample size is 135 in each of the two groups. For a correlation, where 0.10 is the smallest, the sample size is 270. For a crossover of competitive athletic performance it’s 62, and for a controlled trial it’s 118 in each of the two groups (unheard of!). For a prospective study of injury with a smallest important hazard ratio of 1.11 (or 0.90 for reduction in risk), it’s 2020 (also rare!), assuming 50% incidence in the unexposed group and 50% exposure, and much more than 2020 for lower incidence and exposure. Of course, researchers will do studies with smaller sample sizes than these, so now you have to decide at what point the study is publishable. I have yet to see another inferential method that works better than MBI/MBD for decision-making and publishing of small-sample studies, but that doesn’t mean I promote such studies. Give me a break, please, guys.