You have a point. However, our difficulties with the Bayesian approach also have a lot to do with their absence from our training.

# Fragility Index

Conceptually the FI seems to resemble the E-value for observational studies. Does the objection and response to the E-value help you in this discussion? E.g., regarding flipping of P-values.

Not sure about that. I think that analyzing data using Bayesian methods actually requires less math and simpler concepts. It does require more computation, with less availability of off the shelf software (which is rapidly changing thanks to things like the R `brms`

package). The complex multidimensional integrals have been almost completely replaced with numerical solutions that the use doesn’t see except that they need to run diagnostics on the simulated posterior distributions to check convergence.

Do you think Bayesian methods can be taught to students who have not taken calculus? The vast majority of people who take a statistics or research methods do not take any significant amount of mathematics as far as the allied health fields go. So the textbooks remain at the level of elementary algebra and table look up, or the use of “point and click” analyses.

Doctoral level students may take a few research methods classes taught by psychologists or other practitioners.

A few books attempt to teach calculus and stats side by side (IE. Hamming’s *Methods of Mathematics*, or Gemignani’s *Calculus and Statistics*)

An interesting approach was Edward Nelson’s *Radically Elementary Probability Theory* – written from the perspective of nonstandard analysis. (PDF)

Absolutely. The only integration that is central to Bayes is the one that normalizes the simple product of prior and data likelihood, to get a posterior that integrates to one. MCMC made this unnecessary since one can draw samples from the posterior without computing the messy integral that is the denominator just described. The key to understanding Bayes is understanding conditional probability.

Thanks all for this great discussion.

This recent article estimated Fragility Index for phase 3 trials supporting drug approvals in oncology. In addition to all the concerns about the use of FI that were mentioned in this discussion, I think that computing FI with the Fisher’s exact test (as initially done in Walsh et al.) will bias the fragility index towards 0, since at the end of the trial almost all patients would have experienced outcome such as progression-free survival (PFS). Hence the difference in percentages of event will be small.

I share with you some concerns about the use of fragility index, but if one would still use it in the context of time-to-event outcomes, do you have ideas on how we could compute it ? I guess the fact that data are censored make things way more complicated than for dichotomous outcomes.

yes, when you do it for survival times it becomes apparent just how tenuous it is, ie when youre forced to decide: which pts who are alive shall we treat as deceased? i don’t mind scepticism, it’s crucial, but the FI will promote scepticism beyond anything that is reasonable:

“Many phase 3 randomised controlled trials supporting FDA-approved anticancer drugs have a low fragility index, challenging confidence for concluding their superiority over control treatments.”

I would reiterate, again, that there are only MDs on that oncology paper. You don’t even need to look at the author list, it’s apparent as soon as you begin reading the findings and i think this fact has some relevance re the spread of this idea. What could be simpler than calculating FI for a bunch of studies in some disease area and conlcuding: we should all be pessimistic

Right. It’s become a cheap way to get papers (similar to how people that are looking for a quick / easy paper often get the idea that they can just pump out a meta-analysis by finding a recent meta-analysis and updating it with the 1 study that was published since that one came out).

I hesitate to entertain this discussion at all for reasons discussed above, but if one *were* trying to create an equivalent to the fragility index that would better accommodate time-to-event data, I think one possible option would be *something like* the following: (note: haven’t thought this all the way through)

Randomly re-assign the value of “treatment” to every combination of participants, compute the p-value for the time-to-event analysis in every scenario, then average the number of participants who were re-assigned from treatment -> control in only the scenarios where the “significant” result became “non-significant” … now that I think about this it can’t be *quite* right. Maybe someone else can pick up the idea and finish this train of thought…but I’m not sure it’s worth the time and effort, since I’m not really a fan of FI in the first place.

sounds a bit like a permutation test. I guess a worst case scenario analysis might makes sense in some circumstances eg the ones lost to follow-up early could be assumed to have died at the time of withdrawal, and it’s reminiscent of the FI.

I’d like to propose a measure for those fond of FI: take away their 3 highest cited papers, re-calculate their h-index, and thence conclude that they are unproductive. Then we can dismiss them along with all the drugs they have insinuated are inefficacious according to FI

Yeah, the ‘permutation test’ was the parallel that came to mind & made me think something like that is potentially a solution to “how do we get an equivalent to the FI using time to event data”

I’ll repeat myself from earlier in the thread: if people have a problem with “fragile” trial results, what they’re really saying (whether they realize it or not) is that a p=0.05ish result isn’t good enough for them, and rather than bitching about “fragility” they should be calling for more stringent alpha levels (or an equivalently stringent threshold for adoption from Bayesian trials, e.g. very high posterior probability of benefit > some clinically relevant threshold). Of course, this would turn trials that currently require a few hundred patients into trials that require a few thousand patients (or trials that require a few thousand patients into trials that require tens of thousands of patients) and may become cost prohibitive, so they may be inadvertently depriving future patients of potentially promising treatments that simply can’t make it through the development pipeline.

We had a recent Journal review of FI in Oncology RCTs. Not to my surprise, I was very skeptical of the FI, its use and its interpretation.

My understanding of the frequentist interpretation would not allow the change between groups for a different interpretation of the trial and I had problems communicating this in the journal club. It felt as if they were mixing effect sizes with p-values.

I also found this article in the JCE (1) where the authors extended the FI the meta-analysis. What I found particularly interesting was the evaluation of non-significant meta-analysis.

“*The median fragility index of statistically nonsignificant meta-analyses was 7 (Q1-Q3: 4-14, range 1-102). More than one-third (36.0%) of statistically nonsignificant meta- analyses had a fragility index of 5 or less…*”

Later they state:

“*In particular, nonsignificant meta-analyses with more than 1,000 events would need more than six event- status modifications in one or more specific trials to become statistically significant.*”

It would follow that non-significant results are also fragile, even more so than significant trials, with only six-event changes. I still have my doubts if FI can be carried to meta-analysis due to the heterogeneity of trials (patients, outcomes, measurement etc.)

This would make the FI next to useless and I would not be comfortable with using it in any circumstances to draw any conclusions.

that’s an interesting point. There is another discussion on here about the potential for discarding useful drugs: Which error is more beneficial to the society?

the FI is fascinating to me because of how rapdily it has migrated across disease areas, ie within 5 years? Often statistical fashions are peculiar ie confined to a given disease area. I wonder if this is because MDs are pushing the FI and they’re more widely read than statos who have a penchant for habits, or maybe ideas like the FI are hitchhiking on the reproducibility theme …