In medical literature we seem to have 95% confidence intervals locked in. This forum has some interesting discussion about changing the language - compatible intervals, non-divergent intervals etc. That aside, I’d like to suggest that we encourage the (additional) reporting of how confident we are the true value lies within a range that exceeds the null.
eg - for, say, the difference between two means for a biomarker where the measured difference in means and point estimate is 15 ng/L and the 95% confidence interval is -4.6 ng/L to 34.6 ng/L we can state we are 95% confident the population difference lies between -4.6 and 34.6 (if all the assumptions used to compute the interval are correct).
We could also state we are 87% confident it lies between 0 and 30 ng/L. Or ~93% confident it is >0. I think this is more intuitive for most people reading the medical literature.
That said, I may have got this wrong (again!) and I’m not sure that someone would be impressed if they asked me how confident I was that the population difference was exactly 15 ng/L and I answered “zero.”
I have seen this statement in the description of the results of the Bayesian method. Can the confidence interval of the hypothesis test also be expressed in this way?
Just to add: Sorry if my tone sounded like I was doubting — that wasn’t my intention. I’m currently learning Bayesian methods, and my first thought was that Bayesian approaches should be able to express what you mentioned — how confident we are the true value lies within a range that exceeds the null. It seems to me that hypothesis testing has a hard time doing this. I just wanted to check if my understanding is correct.
We cannot state this. The probability that the population difference lies between -4.6 and 34.6 is either 0 or 1 and not 0.95. Interestingly we do not know if it is 0 or 1.
It didn’t seem to me that @kiwiskiNZ was talking about the probability of a parameter lying within an interval. Confidence statements, such as the ones suggested here, sound compatible with the paper below. Personally, I like the idea for the same reason I like to see interpretable posterior probabilities if the analysis is Bayesian.
I agree with the premise that this phrasing aids interpretation and is probably more likely to lead to improved decision-making than the standard terminology around effect sizes. Perhaps a good reason to switch to a Bayesian approach if this is your goal.
While not fully Bayesian, we do something like this in health economics (in a manner of speaking) by stating that, e.g., 80% of the simulations demonstrated cost-effectiveness. This refers to regular old Monte Carlo draws though, not MCMC.
Thank you for the comments… the subtleness of the language is apparent here.
Where I’ve seen the “95% confident” language has been in explanations of confidence intervals and repeat testing (not to express chance). Eg “The correct interpretation of a 95% confidence interval is that we are 95% confident that the true value (also known as the population parameter) is contained within the interval. What we mean by this is that if we repeat the sampling in an identical way many times and produce confidence intervals using each sample, 95% of these intervals would include the true (unknown) population parameter.” [Cameron C, Turner R, Samaranayaka A. Understanding confidence intervals and why they are so important. NZ Med Student J. 2021;33, 42-3.]
Thank you @giuliano-cruz for the reference. I think you are correct that the “confident” statement is compatible with that paper.
Obviously I don’t want a full explanation of how to interpret CIs in every medical paper, but given the responses here is the “XX% confident” language likely to hinder or help interpretation? In particular, what I am keen to do is have a sentence that takes the (clinician) reader’s attention away from whether a 95%CI crosses “0” or “1” depending on the metric. Hence the “93% confident the true parameter is >0” type of statement (could also be >a beneficial value).
Is it even possible to provide a granular interpretation of any particular frequentist interval? My impression is that the answer is “no”…
How about this “high-level” interpretation instead, based on scenarios A through D above?
A) Wide interval suggests few outcomes of interest were observed. Interval could move horizontally to a great degree with future replications of the study. Get more data.
B) Narrow interval suggests many outcomes of interest were observed. Interval wouldn’t likely move horizontally to a great degree with future replications of the study. Result can not “prove” the null hypothesis but also provides little incentive for further study aimed at detecting an efficacy signal.
C) Wide interval suggests few observed outcomes of interest were observed. Interval could move horizontally to a great degree with future replications of the study. Get more data to gauge whether efficacy signal is meaningful or not.
D) Narrow interval suggests many outcomes of interest were observed. Interval wouldn’t likely move horizontally to a great degree with future replications of the study. Replications of the study would likely corroborate meaningful efficacy.
I would stick with @Sander notion of compatibility interval — the interval containing values of a parameter the data are consistent with, with a footnote defining consistent as being compatible at the 0.05 level with a corresponding statistical test.
It’s often really just equivalent to assuming a flat prior… avoiding that seems one of the major advantages of Bayesian methods to many people. I’ll try to look at that paper and comment further.
I have to be pedantic here but the first and second are not equivalent.
The first does not hold as the limits of the CI are not limits on the population parameter.
The second may be true on the long run, but is not useful for interpreting any realized interval of interest.
Quote from Sander’s paper:
The specific 95% confidence interval presented by a study has a 95% chance of containing the true effect size. No! A reported confidence interval is a range between two numbers. The frequency with which an observed interval (e.g., 0.72–2.88) contains the true effect is either 100% if the true effect is within the interval or 0% if not; the 95% refers only to how often 95% confidence intervals computed from very many studies would contain the true size if all the assumptions used to compute the intervals were correct . It is possible to compute an interval that can be interpreted as having 95% probability of containing the true value; nonetheless, such computations require not only the assumptions used to compute the confidence interval, but also further assumptions about the size of effects in the model. These further assumptions are summarized in what is called a prior distribution , and the resulting intervals are usually called Bayesian posterior (or credible) intervals to distinguish them from confidence intervals
Another thing to consider is the potential asymptotic equivalence between two decision rules in terms of expected utility or Bayes risk: one based on tail posterior probabilities and another based on confidence (eg, statements of the type ‘compatibility’/‘confidence’ of some inequality against a given threshold) - or even p-values, for that matter.
In my limited experience, I have found it easier to communicate Bayesian summaries to biologists and clinicians (predictably). Often, however, I have the impression that part of that comes from taking the concept of ‘degree of belief’ as informally as the concept of ‘confidence’. When that happens, I find myself wondering whether the changes of wording by themselves bring actual benefit for the downstream decisions based on the observed data. While I appreciate and try to push for more “statistically-precise interpretations”, I am not sure what degree of precision is required for proper decision making.
Not only is it easier to communicate Bayesian thinking, but the two are far from equivalent, e.g., when doing sequential testing and the frequentist confidence interval is much wider than the corresponding Bayesian uncertainty interval.
Agree, but it would be useful to students of science (and scientist-public communications in general) to provide a link to a document that explains common statistical terms in plain language. Such as for CI