What are credible priors and what are skeptical priors?

Are the primary data available for re-analysis?

If not, I’d remain skeptical of the conclusions as presented by the primary researchers in the narrative section unless I could see the actual data to do my own calculations. But some useful things might still be learned for future experiments by examining the reported descriptive statistics.

This is what I’m trying to understand here on one hand I am reading comments like

"That review, Professor Welsh says, was damning: MBI did not work when compared with accepted principles of statistics.

“They are claiming to have found effects that very likely are not real,” says Professor Welsh.

“It’s increasing the chances of finding an effect that is not there,” says Dr Knight."

“If I was ever to peer review a paper using magnitude-based inference then I would reject it and tell the authors to completely redo their analysis,” says Professor Adrian Barnett, president of the Statistical Society of Australia.”

From those comments it seems like almost a no brainer from the statistics community about using MBI, and any findings are at best should be redone with more data or in another statistical framework.

My greatest confusion relates to MBD (as its currently formed) is it simply a small rebranding of the flawed MBI or has it been improved to such a state whereby sports scientists should feel comfortable using the tools provided. Should there be a ‘‘salvaging’’ of MBD or should sports scientists simply be pointed towards fully developed theories and frameworks until such time as MBD is fully developed.

In response to the remaining queries, I can only emphasize again that

  1. The core issues for MBx are to justify its inputs and accurately describe its outputs based on the application context - just as with any statistical method! The critics explained how MBI failed in these regards (unsurprising as it was not reviewed by statisticians).
  2. MBD is a constructive response to the criticisms which is under review and further development, so is a work in progress. It may resolve soon: I am told a manuscript for this purpose (by parties separated from the MBI originators) is well along. I expect the final form of MBD will locate it clearly within the interval-testing branch of statistical decision theory.

An important point for going forward when looking back: Earlier applications of MBI did not necessarily reach incorrect conclusions. Instead when a report that used MBI is seen as supplying data relevant to a topic, that report needs to be scrutinized to see if its conclusions hold up or fail in light of critical insights about what MBI was actually producing (as opposed to the original MBI descriptions). Note that this advice involves study-specific examinations, and thus is a far cry from simply discarding everything that was published using MBI. Being able to follow it in a particular case hinges on how well the report described its design, execution and data, and if the needed summary statistics were given or computable from the report; if not, the report authors would need to supply those items.

5 Likes

Do you see any issues with MBI being used after this commentary? Apart from just not using standard paradigms.

At its core did the MBI method as originally done meet the criteria to make the method fit for its common usage in sports science.
Second to this, as MBD is formed now (as in, the spreadsheets that can be downloaded from the website) or are @Sander comments more along the lines of.

MBI/MBD doesn’t stand up to standard statistical rational but it can be ‘‘salvaged’’ only after the following changes are applied (which are yet to be done)

There are hundreds of currently cited studies using MBI and possibly hundreds more that didn’t cite using MBI out of fear of something (but used the method to come to its conclusions) for example

It seems that sports scientists (those that use MBI/MBD) are using @Sander comments here to suggest that everything is fine with MBI/MBD which is a little bit weird, because I read his comments as MBD could be OK but only after changes are made (which at this stage is yet to happen)

Thank you @sander for your commentary.

The issue for me being an untrained statistician, is that I read comments like yours, and see the online sports science community take them on board as suggesting that because of the above comments MBI/MBD as being used current and beforehand (with regards to MBI) is perfectly fine.

Which is to say it goes against the stronger statements below

"That review, Professor Welsh says, was damning: MBI did not work when compared with accepted principles of statistics.

“They are claiming to have found effects that very likely are not real,” says Professor Welsh.

“It’s increasing the chances of finding an effect that is not there,” says Dr Knight."

“If I was ever to peer review a paper using magnitude-based inference then I would reject it and tell the authors to completely redo their analysis,” says Professor Adrian Barnett, president of the Statistical Society of Australia.”

Or @f2harrell comments

Do you think that the current implementation as used, currently today as communicated here
sportscience.sportsci meets standard statistical practice. Or are you saying that once Hopkins and Batterham make the suggested changes by yourself, Lakens that then once MBD is ‘‘salvaged’’ and in its ‘‘final form’’ should sports scientists (typically untrained in statistics) use MBD

Once again thank you,

I am sure this seems like a lot of questions, but honestly its such a big topic that the community now seems to be getting one set of answers vs another.

Although I am not statistician (I am sport scientist and strength and conditioning coach currently pursuing PhD), I wanted to share some of my viewpoints regarding the issue of MBI, as well as to provide ‘big picture’ overview of statistical modelling (descriptive, predictive, and causal inference) in the following pre-print:

Statistical Modelling for Sports Scientists: Practical Introduction Using R (Part 1)

I would be more than happy to receive criticism of the paper, as well as hopefully contribute on the topic of MBI.

In short, I am leaning more ‘against’ using MBI for multiple reasons:

  • As Sainani pointed out, not controlling for Type-I error rates (although I do not see any issue with this is justified)
  • Describing confidence interval as Bayesian credible interval (although in very simple stat analysis, with uniform prior, they are equivalent; correct me if I said something stupid here)
  • The probabilities of magnitude of effect (harmful, trivial, beneficial) tend to be pretty much useless to practitioners IMHO - they tend to be confused with random subject probability (or subject proportion) demonstrating harmful, trivial, or beneficial effect (which is often the question coaches ask, e.g. “How likely the random athlete with these characteristic is likely to experience beneficial response”, or “What is expected proportion of athletes experiencing harmful effects”

Hopefully, the paper might contribute to the discussion. Happy to receive any feedback regarding it.

I should hope (perhaps overoptimistically) that my posts above show my views and answers clearly enough, but again: To some extent the MBx debate reflects the never-ending problems with stat testing, especially overinterpretation without regard to the full uncertainty in the situation or the costs and benefits of decisions under different scenarios.

To repeat what I said far above, I suspect the problem MBI attempted to address could have been solved more clearly and uncontroversially by graphing the upper and lower P-value functions with the interval boundaries marked, or by tabulating the upper and lower P-values at those boundaries (including the null), and then explaining the rationale for any decisions based on the resulting graph or table. In parallel one could graph or tabulate likelihood functions at the interval boundaries so one could obtain posterior odds based on prior odds, for use in decisions.

I think the whole of research statistics on pre-specified treatments or exposures should be upgraded by moving to continuous descriptive presentations, emphasizing and illustrating uncertainties and dangers of each decision before making any decision. That is how I have seen competent expert panels operate. But that requires a fundamental change in how basic statistics is taught and used, a change that has been promoted for decades yet is still not adopted as the standard.

For more of my views about statistics reform see my presentation at the NISS webinar, https://www.youtube.com/watch?time_continue=1962&v=_p0MRqSlYec&feature=emb_logo
with others at
https://www.niss.org/news/digging-deeper-radical-reasoned-p-value-alternatives-offered-experts-niss-webinar

Beyond those generalities, I am not a sports scientist nor do I have any stake either way in MBx past or future. So at this point I think we’re overdue to hear from others on this matter including Frank and also principals in the public MBI dispute (Sainani, Batterham, etc.) to whom I sent the present blog link.

4 Likes

I’ll not add anything because you have covered this in my depth that I could. I appreciate all that you’ve written here.

2 Likes

Many thanks to Sander and all for the constructive comments and discussion on MBI. I have provided substantial feedback to those preparing the manuscript that Sander mentioned on situating MBI within an interval hypothesis testing framework, with new ‘decision labels’. Regarding previous studies featuring MBI, Sander made the following statement, which I welcome:

“An important point for going forward when looking back: Earlier applications of MBI did not necessarily reach incorrect conclusions. Instead when a report that used MBI is seen as supplying data relevant to a topic, that report needs to be scrutinized to see if its conclusions hold up or fail in light of critical insights about what MBI was actually producing (as opposed to the original MBI descriptions). Note that this advice involves study-specific examinations, and thus is a far cry from simply discarding everything that was published using MBI. Being able to follow it in a particular case hinges on how well the report described its design, execution and data, and if the needed summary statistics were given or computable from the report; if not, the report authors would need to supply those items.”

One of the examples used in the most recent critique of MBI was an intervention study with n=8 and no control arm. If the authors made definitive effectiveness/ efficacy conclusions from such a study, then irrespective of the statistical approach implemented this is obviously unwarranted. Regarding Sander’s advice for study-specific scrutiny, there’s no need to go any further than the design when examining such a study – definitive conclusions of effectiveness are unjustified. The design is simply deficient for evaluating effectiveness/ making definitive decisions and the sample size is even well short of a reasonable pilot trial, leave alone a ‘definitive’ study. In contrast, in previous studies using MBI that are well designed, executed, and reported, the conclusions may be examined for any hype based on inappropriate over-interpretation of ‘possibly’, ‘likely’ etc. All previous MBI studies should have reported the compatibility (confidence) interval for the effect (or it should be computable from results presented). One simple approach for study-specific examination is inspection of the point estimate and lower and upper limits of this interval in relation to the smallest effect size of interest. This examination permits both simple description and interval testing (at the level implied by the X% compatibility interval reported) to see ‘if the original conclusions hold up or fail’. If P values are reported then S (surprisal) values could also be easily derived. And, of course, ‘when a report that used MBI is seen as supplying data relevant to a topic’, if the relevant summary statistics can be extracted then these papers may also contribute to a meta-analysis. I mention the latter, because some commentators have gone as far as to say that all papers featuring MBI should be discarded/ regarded as ineligible for meta analyses. But, as Sander says, previous studies using MBI should be examined case-by-case for exaggerated conclusions, and their design, conduct, and reporting examined to assess quality/ risk of bias, as with any study, irrespective of the statistics employed. Indeed, a stronger focus on design, methods, and measurement rigour, as well as on statistics, would be welcome moving forward. Thanks again to everyone for their input - it is very much appreciated.

1 Like

Thank you for your comments @Alan_Batterham would you agree with @Sander earlier comments about salvaging MBI, do you agree with the general consensus that it seems as though everyone acknowledges that MBI has numerous issues, and that MBD (while it is yet to be fully formed) after taking on feedback will move to be within an interval hypothesis testing framework.

Regarding previous studies maybe invalid is too strong, but if the statistical community consensus is that MBI as formed and used previously had issues, then surely studies that reached conclusions that only used MBI have issues. Not saying they are wrong but they should at the very least be revisited using standard accepted statistical methods?

1 Like

Thanks @romain . Yes, I agree with Sander’s comments. As I mentioned, the paper being worked up currently situates MBI/MBD within a frequentist interval hypothesis testing framework, with new ‘decision labels’. So, it could be described as a nuanced fusion of equivalence testing and minimal effects testing. Sander has written that one needn’t be constrained by ‘either-or’ analyses and that an in-parallel presentation of frequentist interval testing plus Bayes or semi-Bayes is fine, providing known error control plus posterior probabilities, if desired/ required. In a couple of recent papers, for example, I’ve used a normal prior - to set reasonable bounds on the effect size and achieve the desired shrinkage - combined with the observed data via information-weighted averaging (semi-Bayes) or with a binary outcome using Stata’s penlogit program (augmentation method), both of which Sander has published on. As an aside, I realise that the semi-Bayes approach doesn’t set a prior for all free model parameters (hence ‘semi’ or ‘partial’ Bayes) - just a prior for the overall model effect size (e.g., adjusted treatment effect) but it is a logically consistent and robust method. Full Bayesian analyses are also appropriate, of course. Regarding your second point, I reiterate Sander’s comment that conclusions in previous studies presenting results using MBI are not necessarily flawed. In most cases, as I described, the necessary summaries should be in the Results (e.g. the compatibility – confidence – interval for the effect) to permit readers to make their own judgement. In other words, for the most part these studies do not need to be re-analysed to make a decision regarding their worth – and design and measurement issues are critical here, too, as they are for any study report.

Thank you for your comment @Alan_Batterham, I don’t understand however if you are changing what MBI is current (which you agree needs to be done) with @Sander help. Then why wouldn’t studies that used a method that you are no longer willing to stand by not need to be revisited? If the method used to analyse the data isn’t being stood by, then shouldn’t the studies be revisited with a framework that everyone is standing behind. I think revisited is key word here because likely they are asking interesting questions.

Maybe @f2harrell might know of examples but it seems a bit off to if a method is being moved away from(as everyone seems to be saying it not usable) to not revisit the results seems ridiculous.

I should clarify, in case I seem harsh. I think this is good for science and shows how science should work. Science is an updating process.

Thanks @romain - apologies if I’ve not made myself clear. I agreed fully with Sander’s point, when he said (bold added):

"Instead when a report that used MBI is seen as supplying data relevant to a topic, that report needs to be scrutinized to see if its conclusions hold up or fail … Note that this advice involves study-specific examinations …… Being able to follow it in a particular case hinges on how well the report described its design, execution and data, and if the needed summary statistics were given or computable from the report; if not, the report authors would need to supply those items.”

So, what I was trying to say is that the ‘needed summary statistics’ are typically in the report. If the confidence (compatibility) interval is there, for example, as it should be in most cases, then the reader has everything they need to judge the merits of the conclusions based on the analysis (as described in my first post). In addition, the design and measurement issues must be scrutinised, as for any study. So, I am not at all saying don’t revisit the studies - I’m agreeing with Sander’s suggestion for ‘study-specific examinations’ on a case-by-case basis, as “Earlier applications of MBI did not necessarily reach incorrect conclusions.” In my last post I was just saying that the raw data didn’t need to be re-analysed - the appropriate summaries will be there in most if not all cases. I used an example of an intervention study featuring MBI with n=8 and no control - if that study made definitive conclusions of effectiveness then that’s obviously wrong. But the conclusions from other studies featuring MBI (larger N, better designed and executed, for example) might be fine when one examines the compatibility interval presented. I hope I’ve clarified my point and sorry for confusing you or anyone else. In short, I was simply reinforcing Sander’s comments from my perspective.

thank you for the clarification @Alan_Batterham it helped a lot.

As someone who created MBI along wth Will and I assume involved in the move away from MBI towards MBD with Sander’s help. What would you suggest users of MBI do. Assuming what everyone is saying that MBI is no longer being stood by.

Secondly as I assume Sander will be a co-author on MBD, will this formulation be going into a statistics journal? I think this can only be a positive to have the community that was critical of MBI (as they should have been) accept MBD.

Thirdly being someone who tries to understand the methods I use before using them, will there be a mathematical formulation of MBD made public that wasn’t made with MBI (or if MBI maths is public feel free to correct me and link it).

Once again, I think this is great for sports science, while MBI was flawed, its great to see leaders in the statistics community like Sander Greenland help come up with methods that sports scientists can be confident has a solid statistical framework. Just wish it was available at the start of my career :slight_smile:

Thanks @romain. No, Sander was very clear in an earlier message: “I am not a sports scientist nor do I have any stake either way in MBx past or future.” The lead authors of the paper that is being worked up are Janet Aisbett (a mathematician) and Daniel Lakens. As I stated in my first post, I have provided feedback. I am not sure which journal is being targeted but, yes, there is a mathematical formulation. As I mentioned, the lead authors are repositioning MBI within a frequentist interval testing framework, and it will be one option in the statistical toolbox, including simple descriptions of the compatibility interval (point estimate and lower and upper limits in relation to the smallest effect size of interest), Bayes and semi-Bayes, S-values, standard equivalence testing/ minimal effects testing, severe (severity) testing etc. etc. Thanks again for your input.

thank you @Alan_Batterham, may I make a suggestion. It might be wise to take down the MBI sheets that exist on https://sportscience.sportsci.org/ (as it seems as though everyone agrees that it needs to be moved away from) and once the paper (with Aisbett and Lakens) is released to upload new spreadsheets with a “MBI that sits within a frequentist interval hypothesis testing framework”.

Once again, I hope I haven’t come across as too harsh. I hope that sports scientists see admitting faults and fixing them as a positive rather than a negative as all science should be viewed.

Thanks @romain, the spreadsheets were developed by and are owned/ maintained by Will Hopkins, so it is clearly not in my gift to take them down.

Here’s my spin on recent developments with MBI/MBD/MBx and my response about the spreadsheets at Sportscience.

In emails earlier this year Sander was very helpful in insisting that MBI needed a formulation in terms of hypothesis tests, and he suggested using Daniel Lackens’ approach. In June I contacted Daniel Lackens and Alan Batterham for advice. By July I had presented a seminar about it at the German Sport University in Cologne and written a first draft of a manuscript showing that MBI in its clinical and non-clinical versions is indeed equivalent to two one-sided interval hypothesis tests of substantial magnitudes with reasonable alphas. My manuscript was intended for publication in one of the sport science journals. Alan and I were then contacted out of the blue by Janet Aisbett, a retired maths prof., with the first draft of a manuscript also showing MBI’s equivalence to hypothesis tests. Janet’s manuscript is intended for a stats journal. Since then, and at Sander’s instigation, there have been regular email exchanges among a group of six–Sander, Janet, Daniel, Alan, Kristin Sainani, and me–in summary, to address the three points that Sander mentioned in this thread on Nov 21: the alpha levels one should use, the terminology for the multiple possible results, and how the results should be linked to Bayesian interpretations.

I have not been in a hurry to modify the spreadsheets at Sportscience, because the equivalence of MBI with hypothesis tests with reasonable alphas in my mind means that MBI is not fundamentally flawed, and therefore that no harm was done or is being done with the use of MBI, pending a rigorous frequentist formulation. Also, we have not finished the conversation on the alphas or the link to Bayesian interpretation. We have pretty-much finished the conversation on the terminology: a month ago I had updated the spreadsheet for converting p values to MBD, and I had written an update in the article accompanying the spreadsheet. Strangely, I got no feedback from the group about the spreadsheet or the update, so I will pursue this further before publishing it and moving on to the other spreadsheets. I think my spreadsheets and spreadsheets in general are useful for straightforward analyses, and I think they are also valuable teaching tools for research students. Black boxes they are not! Whether they are more or less susceptible to misuse than packages such as SPSS, Matlab, R, or SAS is an issue to be resolved preferably by evidence rather than opinion.

By the way, my own manuscript got an administrative rejection by the editor of Sports Medicine in August, but the rejection message went to a defunct email address, which I discovered only two weeks ago. I then approached International Journal of Sports Physiology and Performance about publishing it, since that journal had published the non-clinical MBI method in its first issue in 2006, but the editor was not interested. I am not sure what to do next about the manuscript. One possibility is to ask the members of our group to review it for Sportscience. Their input would give the article and the method trustworthiness. I think there is a place for an accessible plain-language description, or at least my version of it, since I think there is a place for the Bayesian interpretation (please don’t jump on me yet), whereas Janet and Daniel’s version is exclusively frequentist.

1 Like

Sharing research data might obviate the need to provide frequentist estimates.

“Sharing research data might obviate the need to provide frequentist estimates.”
I must be misunderstanding you, as I see sharing data oppositely: It reduces the need for informative-prior Bayes estimates, since key external information is then available in its actual data format instead of having been reduced (and likely distorted) into some probabilistic summary called a “prior distribution.”

That follows from a general view of information as a more intuitively helpful concept for statistical foundations than is probability, although the two concepts can be mapped into each other. In particular, “Bayesian statistics” and “frequentist statistics” are just two complementary branches of information statistics, ones in which the information formulas and summaries have been dominated by probabilistic representations.

Unfortunately, even though information representations were worked out in detail in the 1950s, they did not break out of the high-level literature - unsurprising given that basic stat analyses had already been built on formal probabilistic representations since the 18th century, whereas (as far as I’ve read) formal information representations did not even appear until the 20th.

1 Like

First of all, sorry for resurrecting the thread. I mistook it for being from this year. Thanks for indulging me with a reply.

I agree using actual prior data is much preferable to making up prior distributions.

I was considering what one could to with one’s own data. If one makes the data available for future meta-analyses and other uses, maybe it is not so important anymore for one to provide frequentist estimates besides Bayesian (as in with priors) estimates.