What are credible priors and what are skeptical priors?

Hi @Sander I came across this post from reading up on Magnitude based decisions which has been described as a statistical cult

I went to the author of the method (MBD) and he is using this Q and A as sort of a justification for MBD

After reading both, maybe I am misunderstanding, but I don’t see how the comments here justify MBD

1 Like

I will have to look at Magnitude Based Inference (aka Magnitude Based Decisions) a bit more carefully, but Jeffrey Blume has developed an approach that is superficially similar called Second Generation P values. His approach is derived from likelihood theory, and has attractive properties I don’t think the MBI/MBD people are cognizant of yet.

I have not yet proven to myself Blume’s Second Generation P values and MBI are equivalent. It may be the case that these scholars stumbled upon Blume’s good idea, but do not have enough background statistical knowledge to demonstrate correctness.

I’m adding for reference an openly published paper that examines the method in light of known statistical principles by Welsh and Knight (referenced by Sander below).

Magnitude-based Inference: A Statistical Review (link)

The Vindication of Magnitude-Based Inference (link)

The statistical reviewer was Kristin L. Sainani, PhD from Stanford, who is critical of MBI/MBD:

Her critique is here. It looks like a thorough analysis. I managed to find freely available copy below:

The Problem with “Magnitude-based Inference” (link )

I am somewhat concerned that there are no references to the statistical literature in my scan of the PDF you linked to. For example:

I forgot to add Dr. Blume’s American Statistician paper on Second Generation P values, where he goes into a bit more theory.

An Introduction to Second-Generation p -Values (link)

1 Like

yeah one of the weird things about MBI is there isn’t really a peer reviewed mathematical formulation that isn’t the MBI founders website.

Regarding MBx and SGPV, I see them both as rediscovering (in my view, in less clear form than the original sources) the ideas of testing interval hypotheses, as set forth in Hodges & Lehmann 1954:
and extended in many sources since, including books. Lakens et al. provide a basic review of standard interval-testing methods here:
and Lakens and Delacre can be viewed as showing that a basic “equivalence” test procedure (TOST, which actually tests nonequivalence) is preferable to the SGPV: https://osf.io/7k6ay/download

As for MB “inference”, one way it can be salvaged is by recasting it as MB decisions, taking the intervals it uses and applying standard interval-testing methods (like TOST) to them. The chief disputes then seem to come down to

  1. the alpha levels one should use,
  2. the terminology for the multiple possible results, and
  3. how the results should be linked to Bayesian interpretations.

These are problems shared by any NP test, including the infamous point-null hypothesis tests. There is currently some discussion toward producing a consensus manuscript with opposing parties that explains and settles all this, but I cannot say when or even whether it will emerge.

I suspect the problem MBI attempted to address could have been solved more clearly and uncontroversially by graphing the upper and lower P-value functions with the interval boundaries marked, or tabulating the upper and lower P-values at the boundaries, and then explaining the decision rules based on the resulting graph or table. One could also graph the maximum-likelihood ratios (likelihoods divided by the likelihood maximum) or include them in the table of boundary results.


Hi @Sander Thank you for your answer.

When you say salvage - are you saying MBD as currently implemented in the excel spreadsheets available here


Is not doing what it says it is doing? I guess more specifically would you recommend using MBD as it’s currently framed/implemented or encourage people to do something like Hodges & Lehmann.

As a broader question maybe @f2harrell can help too, should people be using MBD in its current form and its current implementation.

I should emphasize again the problem is not the math of MBD - as far as I can see, that’s just doing a set of hypothesis tests on intervals as per H&L 1954 and many sources since. The debate instead concerns where to set its default Type-I error ceilings (alpha levels) and how to label its outputs.

That’s important but really no different than the endless debate around the same issues for basic statistical hypothesis testing: For example, should we label failure to reject “accept”? No, because people will inevitably misinterpret that label as “accept as true”, not as shorthand for “decide for now to continue using the hypothesis” or something similar. And should 0.05 or 0.005 or anything be mandated as a universal default for alpha? (as pseudoscientific journals like JAMA do). No because any default will be a disaster in some situations, those in which power and cost (risk) considerations would tell us to use a very different alpha. Thus forced alpha defaults are like forcing all speed limits to a default of 50 mph whether in a school zone or on a straight 4-lane desert interstate highway. Efforts to do better are what need to be mandated.

Really, the only issue with MBD is how to describe it accurately but succinctly, and how to deal carefully with its settings. Arguably it may be best to simply translate it into existing interval-testing theory and describe it that way. Any such rethinking ought to be reflected in spreadsheet revisions.

As with all methods, however, even the most careful revisions will not stop abuse of MBD. But then, if abuse were a sufficient reason to ban a method then all familiar frequentist and Bayesian methods should be banned, and the life expectancy for any replacement will be very, very short.


Thank you for your reply @Sander

I was reading this


My takeaway from it is that the actual math behind MBD/MBI is the problem.

For example we have this to be taken from the article

“If I was ever to peer review a paper using magnitude-based inference then I would reject it and tell the authors to completely redo their analysis,” says Professor Adrian Barnett, president of the Statistical Society of Australia.

We also have these slides from Welsh

I think this is a nice example why its so hard for people outside of statistics to even pick and choose which methods to use and why I am so thankful for @f2harrell for providing this forum so people like myself without backgrounds can ask questions to experts.

1 Like

for example @f2harrell has called it complete BS

Again thank you for your comments @Sander really helping me understand the issues at hand

1 Like

When I look closely at the criticisms of MBI, the main substance of what I see is complaint about unusually high alpha-level settings. Now high settings are not unheard of in regulatory contexts, e.g. see the ICH document link below, which explains the choice of 0.25; that however appears to be 2-sided and implicitly predicated on the idea that false negatives may be more costly than false positives in their context.

Regarding the Welsh-Knight critique: It seems the current dispute arose because the MBI originators promoted their entire package without attention to these details, and apparently without knowledge of the existing statistical literature on the problem they were addressing. They then supplied no power-cost rationale for their alpha settings in the contexts at issue (or at least supplied none acceptable to their critics). They also offered questionable Bayesian interpretations of their test results and doubled down on some contestable claims, eliciting hard counterattacks.

My view is that conversion from MBI to MBD (a change in description) and its absorption into standard testing theory should follow the advice (which can be found in Neyman’s own writings) to make alpha user-supplied. That calls for careful instructions to users to provide a rationale for what is chosen in terms of power (or expected false-negative rates) and the costs or risks of each decision, specifying that this is best done and documented at the design stage of the study. And then editors and reviewers should require the rationale and documentation from authors. Again, this is a general problem that testers should address; e.g., see https://www.researchgate.net/publication/319880949_Justify_your_alpha
There should also be explication of how any Bayesian interpretation requires a further rationale for the prior implicit in the inversion (which many would argue should not be part of the standard output).

ICH Q1E Guideline links to the International Conference on Harmonization (ICH) document Q1E; see paragraph B.2.2.1 on p. 10:

“Before pooling the data from several batches to estimate a retest period or shelf life, a preliminary statistical test should be performed to determine whether the regression lines from different batches have a common slope and a common time-zero intercept. Analysis of covariance (ANCOVA) can be employed, where time is considered the covariate, to test the differences in slopes and intercepts of the regression lines among batches. Each of these tests should be conducted using a significance level of 0.25 to compensate for the expected low power of the design due to the relatively limited sample size in a typical formal stability study.”


Once again thank you @Sander for your explanation. Is it your view that the conversion from MBI to MBD was done in such a way that users should feel comfortable with the tools provided at the moment.

Once again thank you for your time answering - I know a few people within sports science have already messaged me to read these posts.

I think these changes are improvements; but then I would, since I gave my views and the author considered and used them as he saw fit.

Here’s an experiment for which the outcome is likely to be very informative no matter what it is: Ask Welsh, Knight, and Sainani to read this blogpost series, especially my comments below; then look at the updates; and then post their view about whether the MBD idea is moving in the right direction away from MBI toward standard decision theory, and what more would help it do so.

As I see it, the critics assumed “conservative” preferences (preferring the null or anchored around it), which I call nullism, and traditional defaults, as shown their orientation toward CIs (which fix traditional alpha levels). In contrast, the originators assumed “liberal” (wide open to moving far off the null) preferences, which I call agnosticism, and are not wedded to traditional alpha defaults either; but their justification for breaking from tradition was not apparent to the critics.

The justification might be apparent in a fully developed MBD theory based on a decision function, in which alpha is a function of explicit utilities and data distributions under competing models (instead of an input parameter). In this formulation, Bayes rules are admissible, and MBD yields a limit on Bayes rules that is calibrated against the sampling-model frequencies.

This places the theoretical substance of the dispute in the loss functions implicit in the different preferences for alpha levels. These preferences are the irreducible subjective inputs to decision rules, and need to be made explicit because decisions are driven by them. They will vary across stakeholders (those who experience measurable losses or gains from decisions); so, unless a compromise is reached, stakeholders will often disagree on the preferred (“optimal”) decision. This point is recognized in both frequentist and Bayesian decision theories, which parallel one another far more than they diverge.

Whatever one thinks of MBI, it was proposed to solve a real-world statistical problem whose solutions had not been taught enough for its developers to know about. I find it interesting the developers got as close as they did to the standard solutions in form if not in description. I think their effort demonstrates a need for interval testing, and so the theory for that needs to be settled enough to introduce at an elementary level. Unfortunately, as with all things statistical, there is no consensus on all aspects of interval testing - in part because of differences in hidden assumptions about loss or utility functions.


once again @Sander thank you for your answer.

Would I be interpreting you correctly, if I took away the message as MBD as its currently formed (and hence what has been in use) is not fully developed.

Which might likely be why there are a lot of ''problems with MBI/MBD"

Hence it would be best for practicing sports scientists to stay away until a fully developed MBD theory is released in spreadsheet form.
Maybe I am reading too much into things, but from an applied setting it seems as though using a not fully developed theory in practice is only really doomed for findings which are misleading.

Does @f2harrell have any thoughts, I see quite a bit of similarity between sports science and medicine (small samples etc)

Once again thank you, this is great!

1 Like

@Sander has thought deeply about this and has written clearly, so the only remaining thoughts I have are to wonder why the MBI inventors did not just use existing paradigms with possibly transparently relaxed criteria, and I always push displays of entire posterior distributions.


As has been pointed out by Sander in this thread, the most charitable interpretation of MBI/MBD is that it is interval hypothesis testing with a non-traditional \alpha level. I greatly appreciate his insight into this (and saving me lots of study time!).

There is nothing wrong with breaking the irrational tradition of a fixed level \alpha (it is almost certainly a welcome development), but those who do should be able to provide the appropriate calculations to justify the decision.

I can empathize with the scholars you mentioned in this thread. It takes an extraordinary amount of study time for a statistical outsider to study all of the relevant materials someone educated in statistics formally takes for granted. That has been made much easier by the internet, but if you don’t know the right terminology and prior work done, you will have embarrassing gaps in your knowledge when you talk to actual statisticians. You can’t even benefit from their legitimately earned expertise without some real study, and an intro stats class is not enough.

I can also empathize with the critics. A lot of excellent scholarship is buried in the old journals that should have been referenced in their papers. With a bit more research, they would not have made some indefensible claims that undermined their credibility.

Example: I was aware of the Hodges-Lehmann interval hypothesis testing paper before it being mentioned it in this thread. I had known of (but not studied) the review of SGPV he also referenced here. I am more optimistic on SGPV than others in this thread might be, but I haven’t worked out a full argument yet.

Outsiders who propose a new technique should understand the basic methods of justification of statistical procedures in order to present a rational argument regarding the benefits of the proposal.

Statisticians of whatever philosophy are very smart, and if they tell you something is not correct, you had better go back and check your work and/or your comprehension.

1 Like

thank you @r_cubed I guess my concern is that it seems that @Sander is saying (I could be wrong) that MBI as was wasn’t correct, and that now MBD (not as currently formed and hence used) might in future be on more solid statistical grounds but currently that is not the case.

The issue is obviously hundreds of studies were done and are continuing to be done using MBI (as formulated and readily downloaded on the creators website) while what it needs to move towards (MBD) isn’t fully formulated in a way that makes experts comfortable.

Without submission to any peer reviewed statistics journal, its very hard from reading the above commentary is what should the takeaway be for sports scientists.

Should we be using what is available online as is, if we have used what is available online as was (MBI) converting p-values to MBD should we stand by those findings?

Once again I greatly appreciate everyones time

Are the primary data available for re-analysis?

If not, I’d remain skeptical of the conclusions as presented by the primary researchers in the narrative section unless I could see the actual data to do my own calculations. But some useful things might still be learned for future experiments by examining the reported descriptive statistics.

This is what I’m trying to understand here on one hand I am reading comments like

"That review, Professor Welsh says, was damning: MBI did not work when compared with accepted principles of statistics.

“They are claiming to have found effects that very likely are not real,” says Professor Welsh.

“It’s increasing the chances of finding an effect that is not there,” says Dr Knight."

“If I was ever to peer review a paper using magnitude-based inference then I would reject it and tell the authors to completely redo their analysis,” says Professor Adrian Barnett, president of the Statistical Society of Australia.”

From those comments it seems like almost a no brainer from the statistics community about using MBI, and any findings are at best should be redone with more data or in another statistical framework.

My greatest confusion relates to MBD (as its currently formed) is it simply a small rebranding of the flawed MBI or has it been improved to such a state whereby sports scientists should feel comfortable using the tools provided. Should there be a ‘‘salvaging’’ of MBD or should sports scientists simply be pointed towards fully developed theories and frameworks until such time as MBD is fully developed.

In response to the remaining queries, I can only emphasize again that

  1. The core issues for MBx are to justify its inputs and accurately describe its outputs based on the application context - just as with any statistical method! The critics explained how MBI failed in these regards (unsurprising as it was not reviewed by statisticians).
  2. MBD is a constructive response to the criticisms which is under review and further development, so is a work in progress. It may resolve soon: I am told a manuscript for this purpose (by parties separated from the MBI originators) is well along. I expect the final form of MBD will locate it clearly within the interval-testing branch of statistical decision theory.

An important point for going forward when looking back: Earlier applications of MBI did not necessarily reach incorrect conclusions. Instead when a report that used MBI is seen as supplying data relevant to a topic, that report needs to be scrutinized to see if its conclusions hold up or fail in light of critical insights about what MBI was actually producing (as opposed to the original MBI descriptions). Note that this advice involves study-specific examinations, and thus is a far cry from simply discarding everything that was published using MBI. Being able to follow it in a particular case hinges on how well the report described its design, execution and data, and if the needed summary statistics were given or computable from the report; if not, the report authors would need to supply those items.


Do you see any issues with MBI being used after this commentary? Apart from just not using standard paradigms.

At its core did the MBI method as originally done meet the criteria to make the method fit for its common usage in sports science.
Second to this, as MBD is formed now (as in, the spreadsheets that can be downloaded from the website) or are @Sander comments more along the lines of.

MBI/MBD doesn’t stand up to standard statistical rational but it can be ‘‘salvaged’’ only after the following changes are applied (which are yet to be done)

There are hundreds of currently cited studies using MBI and possibly hundreds more that didn’t cite using MBI out of fear of something (but used the method to come to its conclusions) for example

It seems that sports scientists (those that use MBI/MBD) are using @Sander comments here to suggest that everything is fine with MBI/MBD which is a little bit weird, because I read his comments as MBD could be OK but only after changes are made (which at this stage is yet to happen)

Thank you @sander for your commentary.

The issue for me being an untrained statistician, is that I read comments like yours, and see the online sports science community take them on board as suggesting that because of the above comments MBI/MBD as being used current and beforehand (with regards to MBI) is perfectly fine.

Which is to say it goes against the stronger statements below

"That review, Professor Welsh says, was damning: MBI did not work when compared with accepted principles of statistics.

“They are claiming to have found effects that very likely are not real,” says Professor Welsh.

“It’s increasing the chances of finding an effect that is not there,” says Dr Knight."

“If I was ever to peer review a paper using magnitude-based inference then I would reject it and tell the authors to completely redo their analysis,” says Professor Adrian Barnett, president of the Statistical Society of Australia.”

Or @f2harrell comments

Do you think that the current implementation as used, currently today as communicated here
sportscience.sportsci meets standard statistical practice. Or are you saying that once Hopkins and Batterham make the suggested changes by yourself, Lakens that then once MBD is ‘‘salvaged’’ and in its ‘‘final form’’ should sports scientists (typically untrained in statistics) use MBD

Once again thank you,

I am sure this seems like a lot of questions, but honestly its such a big topic that the community now seems to be getting one set of answers vs another.

Although I am not statistician (I am sport scientist and strength and conditioning coach currently pursuing PhD), I wanted to share some of my viewpoints regarding the issue of MBI, as well as to provide ‘big picture’ overview of statistical modelling (descriptive, predictive, and causal inference) in the following pre-print:

Statistical Modelling for Sports Scientists: Practical Introduction Using R (Part 1)

I would be more than happy to receive criticism of the paper, as well as hopefully contribute on the topic of MBI.

In short, I am leaning more ‘against’ using MBI for multiple reasons:

  • As Sainani pointed out, not controlling for Type-I error rates (although I do not see any issue with this is justified)
  • Describing confidence interval as Bayesian credible interval (although in very simple stat analysis, with uniform prior, they are equivalent; correct me if I said something stupid here)
  • The probabilities of magnitude of effect (harmful, trivial, beneficial) tend to be pretty much useless to practitioners IMHO - they tend to be confused with random subject probability (or subject proportion) demonstrating harmful, trivial, or beneficial effect (which is often the question coaches ask, e.g. “How likely the random athlete with these characteristic is likely to experience beneficial response”, or “What is expected proportion of athletes experiencing harmful effects”

Hopefully, the paper might contribute to the discussion. Happy to receive any feedback regarding it.