We are writing a position paper on the use of *p* values and confidence intervals on behalf of the M-CPHR RNG group (Methods in Clinical and Population Health Research - Research Network Group) and have come to the conclusion that there is a radical need for a change in terminology rather than a change in scale of the *p* value as suggested by @Sander and others or a replacement for the p value as some have suggested. We agree with @Stephen that the real problem is making an idol of the p values. In addition a radical rethink of confidence intervals is required and we have suggested that a non-divergent interval be proposed (e.g. 95% nDI) and this is illustrated below with reconceptualization in terms of percentile location on a model. We considered compatibility and uncertainty intervals as suggested by Sander and Andrew but finally decided to offer the nDI as the simplest way forwards in terms of a language change. I know that a while ago @f2harrell asked the question *‘Challenge of the day: suppose a randomized trial yielded a 0.95 confidence interval for the treatment odds ratio of [0.72, 0.91]. Can you provide an exact interpretation of THIS interval?’*

Perhaps the 95%nDI answers this question as illustrated in the graph below (sample data with mean 10 and SE 1):

I would need to see an interpretation of this that a non-statistician would understand before I favored this terminology. And I’m not seeing what’s wrong with compatibility interval.

The 95%nDI of [8,12] is interpreted as indicating the possible range of effect models on which the location of the observed test statistic is **less extreme** than their 2.5%tile value for any symmetrical pair of models in that interval.

There is nothing wrong with compatibility intervals (probably the best recommendation thus far) but we could not reconcile a common language across *p* values and confidence intervals using ‘compatibility’ and therefore one term had to change - we then decided to drop compatibility and adopt nDI as it would perhaps be the easiest linguistically across both. Note the aim here is to get us out of this interpretational misuse - we have no problem with the core concept except that in hindsight I think NP creating the decision-theoretic framework did more harm than good for interpretation in common use.

I’m not sure I grasp this - neither the explanation or graph is intuitive to me.

For the physicians I work with who are used to confidence intervals (even though often misinterpreting them) the questions would arise: “divergent from what?” (as they would also ask “compatible with what?”)

This is exactly the question this terminology was meant to elicit - divergent from what? *Divergent from the hypothesized effect model. If the hypothesized model was the null model then the data collected are divergent from it.*

The graph shows data from which a mean of 10 and SE of 1 has been obtained. Say this is a mean weight loss of a selected group of people after a specific treatment. The horizontal capped lines are the nDIs (previously known as CIs) at different percentage levels (previously known as confidence levels). e.g. the one corresponding to 95% nDI is also what was previously known as 95% CI. The scale next to the percent nDI scale shows the location of the data on the model (at the limit of the interval) in percentiles. So for the limit of the 95% nDI this is the 2.5%tile. The solid curve is the model whose parameter value equals the mean from the data.

Now that the graph is explained, the interpretation: The 95% nDI is approximately [8, 12] (from the graph). Therefore models whose parameter values are between 8 and 12 are statistically non-divergent with the data (at the 5% divergence level, formerly significance level). The classic interpretation is that “if we were to replicate a study of this size multiple times and get multiple CIs, then 95 % of such intervals would be expected to contain the ‘true’ parameter”. The classical explanation is about SUCH intervals and we have no idea if THIS interval captured the population parameter. The %nDI is about this interval and therefore we can infer that since 8,12 extends to the 2.5%tile location on the model, this is the likely range of models ‘compatible’ or ‘non-divergent’ with the data. We could still be wrong given that this is a 2.5%tile location. This also means that if our null hypothesis is about no weight loss, then since 0 is not in this interval, a model with mean zero is divergent from the data, hence this is a *statistically divergent* result. In the past we would have said that this is a *statistically significant* result. The problem is that since ‘significance’ has a clear English language meaning of importance, the statistical implication was lost (much less likely to ask the important question - *“significant from what?”* and just assume importance).

As the percent on the nDI declines, it means the divergence threshold (formerly significance threshold) is being made less extreme and therefore the width of the interval declines and when the % is zero the interval has *no width* [10,10] because the percentile location is 50%tile and there is no test statistic less extreme than this. In other words, if the divergence level (formerly significance level) is set to 100%, there can be only one model (the model shown as a curve where the test statistic sits on its location boundary) and no others.

This graph might clarify things: If we imagine the models as the dashed lines depicting sampling distributions, the range of model means non-divergent with the data are the models with means starting from the left panel (at the lower limit of the nDI) and as the sampling distribution slides to the right (mean increasing till it hits the upper limit of the nDI), those are the range of models in the nDI non-divergent with the data. The location of the sample mean on the left model is at its 2.5%tile location and then as this is slid across it moves to the 50%tile location and then back to the 2.5%tile location on the right. We no longer need to worry about probabilities, and the range of models on which the data is located on the 2.5%tile or less extreme location are the models non-divergent with the data. Models with means less than 8 e.g. 0 would be interpreted as divergent with the data meaning that at our divergence level we should begin to question that model as the data generating mechanism and perhaps consider an alternative to it. If we were to select a null hypothesis as one of those models to a more extreme left position than the model on the left panel, a computed *p* value would now be <0.05 meaning the sample data is statistically divergent at the 5% divergence level with respect to our chosen null hypothesis. There are several advantages here of this terminology:

a) Statistically divergent and nDI are unified in terminology

b) we keep doing things the same way e.g. statistically divergent result resonates with people used to statistically significant result

c) We therefore just need to change one term but not how it is used