Some thoughts on uniform prior probabilities when estimating P values and confidence intervals

You have to show me your proof first of course.

He already did this, twice. Your result conflicts with known facts from the algebra of random variables.

When this is not convenient, you appear resort to the motte and bailey fallacy by pointing out “closeness” to a certain empirical results.

This is not valid mathematical reasoning, although sometimes it may help in developing a proof.

You can’t call what is at best your conjecture, a theorem without showing how to derive it from known mathematical facts, that will hold for arbitrary values.

The following example is taken from:

Hamming, R. W. (2012). Methods of mathematics applied to calculus, probability, and statistics. Dover Publications.

Hamming gives the following example as a hypothetical conjecture asserting a universal formula to compute prime numbers:

f(n) = n^2 - n + 41

A few moments of thought should show that, while this will hold for n \in \mathbb{Z} & (n: 0 \le n \le 40), it will fail when n = 41.

1 Like

I have reviewed my logic and agree with you that my equivalent statement was wrong. The correct equivalent statement should not have been

P(b_\text{repl} \mid b,s,n_1,n_2)∟N(b, \frac{s^2}{n_1}+\frac{s^2}{n_2})

but

P(b_\text{repl} \mid b,s,n_1,n_2)\sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}},\ 1 \right)

This expression now leads logically on to :

P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right).

I had arrived at the expression immediately above by using the same reasoning process but had not used your notation until you asked me to do so. I made a mistake while arriving at the incorrect statement by working retrospectively so to speak. However the above final expression remains the same and all the results of applying it to data remain the same of course.

1 Like

I have reviewed my logic

Better late than never, eh?

The original statement

b_\text{repl} \mid b,s,n_1,n_2 \sim N(b, \frac{s^2}{n_1}+\frac{s^2}{n_2})

was derived from (compatible with) an assumed model where \beta has the uniform distribution while

b \mid \beta,s,n_1 \sim N(\beta,s^2/n_1) \quad \text{and} \quad b_\text{repl} \mid \beta,s,n_2 \sim N(\beta,s^2/n_2)

The new statement (I’m assuming v_i=s^2/n_i)

b_\text{repl} \mid b,s,n_1,n_2) \sim N \left( \frac{b}{\sqrt{v_1 + v_2}},\ 1 \right)

comes out of nowhere. Note that it’s even impossible if v_2>1. Just for fun, try to think of a prior distribution for \beta that would lead to this. Hint: I don’t think you can, but if you could it would have to depend on v_2 which obviously disqualifies it as a prior! The uniform prior for \beta certainly doesn’t work.

PS

This expression now leads logically on to :
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right).

Not quite. It leads to the following

P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2)

=P(b_\text{repl} > 1.96\, s/\sqrt{n_2} \mid b,s,n_1,n_2)
=P\left(b_\text{repl} > 1.96\, \frac{s}{\sqrt{n_2}} - \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}}\mid b,s,n_1,n_2 \right)
=\Phi \left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\, \frac{s}{\sqrt{n_2}} \right)

High school math is hard!

=\Phi \left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\, \frac{s}{\sqrt{n_2}} \right)

I have to keep working backwards to try to satisfy you demands!

Try replacing::

b_\text{repl} \mid b, s, n_1, n_2 \sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}}, 1 \right)

with:

z_\text{repl} = \frac{b_\text{repl}}{s / \sqrt{n_2}} \mid b, s, n_1, n_2 \sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}}, 1 \right)

to see if your high school maths leads you only to:

P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right).

z_\text{repl} = \frac{b_\text{repl}}{s / \sqrt{n_2}} \mid b, s, n_1, n_2 \sim N \left( \frac{b}{\sqrt{v_1 + v_2}}, 1 \right)

is of course the same as

b_\text{repl} \mid b, s, n_1, n_2 \sim N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, v_2 \right).

This conditional distribution of b_\text{repl} is the convolution of the posterior distribution of \beta after the first study, and N(0,v_2). This implies that the posterior distribution of \beta after the first study is N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, 0 \right). In other words, after the first study we have absolute certainty that \beta is \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}. Do you understand that that is absurd?

It gets worse. To have posterior certainty, we must also have prior certainty. So your model implies that the prior for \beta is N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, 0 \right). This is not even a valid prior because it depends on b.

If our discussion up to this point is anything to go by, I expect you’ll tell me that all this doesn’t matter because you have a special philosophy, and you have years of experience as a doctor, and your “Predictive Probability of Replication Success” is similar to my results with Goodman. I would answer:

  1. The model from my paper with Goodman is based on more than 40,000 trials which cannot be compared to the experience of a single doctor.
  2. Our model is coherent, so what does your incoherent model add to that?
  3. We present a method for developing (coherent) models which can be applied in other contexts, see for example Evidenced-Based Prior for Estimating the Treatment Effect of Phase III Randomized Trials in Oncology

As I’ve mentioned before, I don’t get the impression that you’re really interested in my criticism. All you seem to care about is to defend your “Predictive Probability of Replication Success” no matter what I say. I find that quite disappointing.

Of course I am interested because I am trying to understand why we get different results. I have been probing your thought process. It seems to me that you keep invoking another Bayesian prior on top of my flat prior. Please understand, that my expression:

P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b}{\sqrt{v_1 + v_2}} - 1.96 \right)

depends only on a flat prior and a consequent direct estimation of b_repl conditional on b by adding the variance from the completed study based on s and n1 to the estimated variance of the second study depending on the same s but a n2 that may or may not be different to n1.

Unbelievable! I’ve demonstrated so many times now that the flat prior on \beta implies

P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2)= \Phi\left( \frac{b - 1.96\frac{s}{\sqrt{n_2}}}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} \right).

And as I’ve just demonstrated to you, to get your “Predictive Probability of Replication Success”

P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right)

you would have to assume that the prior on \beta is N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, 0 \right) which is invalid / makes no sense.

2 Likes

OK. Let me try again. Bear with me. My predictive probability of replication success is based on:

P(z_{\text{repl}} > 1.96 \mid b, s, n_1, n_2) = \Phi\left( \frac{b}{\sqrt{ \frac{s^2}{n_1} + \frac{s^2}{n_2} }} - 1.96 \right)

This is in turn based on the predictive distribution:

b_{\text{repl}} \mid b \sim \mathcal{N}\left( b,\ \frac{s^2}{n_1} + \frac{s^2}{n_2} \right)

A Bayesian might argue that for this predictive distribution to hold, one must have assumed a posterior distribution of the form:

\beta \mid b \sim \mathcal{N}\left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}},\ 0 \right)

This implies complete certainty about the value of \beta , and that this posterior mean reflects a form of shrinkage. I think that this criticism is incorrect because I assume that:

  1. A flat (non-informative) prior on \beta , i.e. \pi(\beta) \propto 1 . This yields the posterior:
    \beta \mid b \sim \mathcal{N}(b, v_1)

    1. The replication estimate, conditional on the posterior, is:
      b_{\text{repl}} \mid b \sim \mathcal{N}(b,\ v_1 + v_2)

    This results from convolving:
    \beta \mid b \sim \mathcal{N}(b, v_1) \quad \text{and} \quad b_{\text{repl}} \mid \beta \sim \mathcal{N}(\beta, v_2)

The mean remains at b and the variance is (v_1 + v_2), reflecting uncertainty. A degenerate distribution (zero variance) would imply complete certainty, which contradicts the observed data-based uncertainty in b .

I think that you may be incorrectly treating the predictive mean as a shrinkage-weighted posterior mean, which would require a prior dependent on the data:
\beta \sim \mathcal{N}\left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}},\ 0 \right)

Such a prior is invalid because as you say, it depends on the observed data b , violating the rules of Bayesian inference.

So my expression does not assume a degenerate or data-dependent prior. It simply uses a flat prior, yielding a posterior of \mathcal{N}(b, v_1) , and a predictive distribution for b_{\text{repl}} of \mathcal{N}(b, v_1 + v_2) . So I think that perhaps your criticism arises from incorrectly imposing shrinkage logic and misinterpreting the predictive distribution’s mean as evidence of an informative prior.

NO, IT IS NOT. For the umpteenth time:

b_{\text{repl}} \mid b \sim \mathcal{N}\left( b,\ \frac{s^2}{n_1} + \frac{s^2}{n_2} \right)

implies (is even equivalent to)

P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2)= \Phi\left( \frac{b - 1.96\frac{s}{\sqrt{n_2}}}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} \right).

I understand that you are a retired physician with an interest in probability and statistics. Good for you! However, you are evidently not very skilled as you continue to struggle with what is actually a very simple model. So let’s make a deal. You don’t write papers about statistical methodology and when I retire, I won’t start treating patients.

1 Like

I am a retired physician, medical scientist and clinical teacher with a continued interest in seeing medical data interpreted well. I am very concerned about the continuing replication crisis and misinterpretation of diagnostic tests, especially new ones with exaggerated claims. Both these problems are continuing to cause waste and harm. I would like to see improved mutual understanding between statisticians and people like me from exchange of ideas through dialogue and publications, including on DataMethods. I will continue to do so.

OK, but if you want to improve your understanding of statistics then you need to listen to me when I keep telling you the same thing over and over again.

Exactly. That sums up your approach. All you have done is to repeat assertions “over and over again” without explaining your reasoning in a calm and measured way, leaving me to try to guess what you were thinking. It is not what I am used to, especially on this site, which supports helpful dialog. It would have been helpful if you had used my post 209 to clarify matters step by step but you chose not to.

There are a number of gaps in your thinking, which is why I cannot make heads or tails of what your assumptions are.

  1. You seem to believe that even though your first estimate came from a normal distribution, you need to assume a uniform prior before the second experiment.
  1. You confuse what is assumed or known vs. what is computed. We don’t assume posterior distributions (in typical cases); posteriors are computed from a combination of prior \times likelihood. Assumptions come in the form of a prior distribution and a data generation model before the data are seen. The order in which things are known makes a difference that isn’t explicitly reflected in elementary algebraic manipulations of statistical quantities. If you take care to order your computations correctly, elementary algebra will still yield correct results.

If you are going to use an improper uniform distribution, you need to assert it as part of your model for the first experiment. After that first experiment, the computed Bayesian posterior does indeed act as an informative prior for the replication.

That is why many find Bayesian analysis a beautiful model for learning from data.

There is no “shrinkage” in the sense of pulling the observed mean towards zero, as EvZ pointed out to me. But the Bayesian result does add a term to the variance, compared to a naive estimate of b_{obs} = SNR, so this does reduce replication probabilities when we convert back to the probability scale.

If you don’t compute your prediction from the observed initial estimate that accounts for uncertainty, what exactly are you assuming?

2 Likes

That is very unfair, because I’ve spent so much time writing all the steps out in great detail. I just hope there are at least some followers who do find this helpful.

The first assertion of your post 209 is that

b_{\text{repl}} \mid b \sim \mathcal{N}\left( b,\ v_1 + v_2 \right)

implies that

P(z_{\text{repl}} > 1.96 \mid b, v_1, v_2) = \Phi\left( \frac{b}{\sqrt{ v_1+v_2 }} - 1.96 \right)

This is false as I’ve demonstrated many times. I’ll demonstrate it one more time, but now I’ll number the steps so that you can point out to me where you disagree. I’ll follow your post 209 as closely as I can.

  1. Assume the flat prior on \beta . Since b \mid \beta,v_1 \sim N(\beta, v_1), it follows that
    \beta \mid b, v_1 \sim N(b, v_1).

  2. Since b_{\text{repl}} \mid \beta,v_2 \sim N(\beta, v_2) it follows that
    b_{\text{repl}} \mid b,v_1,v_2 \sim N(b, v_1 + v_2)

  3. Since z_\text{repl} = b_\text{repl}/\sqrt{v_2} it follows that
    P(z_\text{repl} > 1.96 \mid b,v_1,v_2) =P(b_\text{repl} > 1.96\, \sqrt{v_2} \mid b,v_1,v_2)

  4. Standardize the (conditional) distribution of b_\text{repl} by first subtracting the (conditional) mean (which is b) and then dividing by the (conditional) standard deviation (which is \sqrt{v_1+v_2}).
    P(b_\text{repl} > 1.96\, \sqrt{v_2} \mid b,v_1,v_2) = P\left(\frac{b_\text{repl} - b}{\sqrt{v_1+v_2}} > \frac{1.96\, \sqrt{v_2} - b}{\sqrt{v_1+v_2}} \mid b,v_1,v_2\right)

  5. Conditionally on b,v_1 and v_2, (b_\text{repl} - b)/\sqrt{v_1+v_2} has the standard normal distribution. So, we conclude that
    P(z_\text{repl} > 1.96 \mid b,v_1,v_2) = \Phi\left(\frac{b - 1.96\, \sqrt{v_2}}{\sqrt{v_1+v_2}} \right)

  6. We conclude that the flat prior on \beta does not imply
    P(z_{\text{repl}} > 1.96 \mid b, v_1, v_2) = \Phi\left( \frac{b}{\sqrt{ v_1+ v_2}} - 1.96 \right)

4 Likes

The problems appear to stem from this post:

  1. After the first study, he fails to condition on the observed mean, and introduces a new variable m
  2. In the derivation of m, he confuses his measurement scale (which has sample sizes as part of the calculation) with the standardized statistical scale. By substituting 1 for n, his weight for the z score drops out of the equation, which isn’t correct algebra in the general case.
2 Likes

Thank you. That’s clear.

So to recap and then to follow my train of thought:

I define the predictive probability of replication success as:
P(z_{\text{repl}} > 1.96 \mid b, s, n_1, n_2) = \Phi\left( \frac{b}{\sqrt{ \frac{s^2}{n_1} + \frac{s^2}{n_2} }} - 1.96 \right)
This is derived using:

  1. A flat prior on \beta , so that \beta \mid b \sim \mathcal{N}(b, v_1) , where v_1 = s^2 / n_1
  2. The sampling model for replication: b_{\text{repl}} \mid \beta \sim \mathcal{N}(\beta, v_2) , where v_2 = s^2 / n_2

The resulting predictive distribution is:
b_{\text{repl}} \mid b \sim \mathcal{N}(b, v_1 + v_2)

Standardizing with the replication statistic as:

z_{\text{repl}} = \frac{b_{\text{repl}}}{\sqrt{v_1 + v_2}} \sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}},\ 1 \right)

The probability of replication success is:

P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b}{\sqrt{v_1 + v_2}} - 1.96 \right)

However, your interpretation defines the replication z-statistic as:

z_{\text{repl}} = \frac{b_{\text{repl}}}{\sqrt{v_2}}

Then:

P(z_{\text{repl}} > 1.96 \mid b) = P\left( b_{\text{repl}} > 1.96 \sqrt{v_2} \mid b \right)

Given:

b_{\text{repl}} \mid b \sim \mathcal{N}(b, v_1 + v_2)

you standardize:

P\left( b_{\text{repl}} > 1.96 \sqrt{v_2} \right) = P\left( \frac{b_{\text{repl}} - b}{\sqrt{v_1 + v_2}} > \frac{1.96 \sqrt{v_2} - b}{\sqrt{v_1 + v_2}} \right) = \Phi\left( \frac{b - 1.96 \sqrt{v_2}}{\sqrt{v_1 + v_2}} \right)

All this suggests that both expressions are mathematically valid, but they are answering different questions:

  1. My approach defines the replication test statistic using the total standard error \sqrt{v_1 + v_2} , reflecting all sources of uncertainty. This yields:

    P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b}{\sqrt{v_1 + v_2}} - 1.96 \right)

  2. Your approach defines z_{\text{repl}} as the replication statistic using the replication variance v_2 , which gives:

    P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b - 1.96 \sqrt{v_2}}{\sqrt{v_1 + v_2}} \right)

So, the flat prior does lead to my expression, provided the z-statistic is defined in terms of the total uncertainty (as I intend). Your formulation assumes a different test definition but does not invalidate the logic behind my approach.

Thank you. Perhaps my post 217 in response to @EvZ answers you points.

Wrong. Your formulation does not account for prior uncertainty, as you are missing a term on what I call the measurement scale, that incorporates sample sizes for the prior.

You missed this with the insistence on deriving things on the measurement scale.

The way i understand EvZ’s R code for the Bayesian credible interval for the effect is by thinking of the Bayesian computation as a data augmented combination of standardized Z scores. Sander Greenland describes using data augmentation to perform Bayesian computations with frequentist software in this post:

In this setup, on the standardized scale, information is represented as a shift from our sampling model N(0,1) in units known as probits. The uniform, improper prior amounts to assuming a previously conducted study with a result N(0,1) Probits are additive, and our normality assumption allows us to treat our prior and observed data as normally distributed random variables.

By the rule of summation of random variables, our Bayesian posterior after the first study is the sum of 2 normal random variables N(\theta, \sigma^2)
Prior: N(0,1)
Data: N(z,1)
Posterior: N(z,2)

To get back to the standardized normal distribution scale with a variance of 1, we have to divide by the square root of the variance. So our credible interval after the first study is:

Standardized scale: N(z,\sqrt{2})

Our Bayesian prediction interval on the standard normal scale, assuming a uniform prior, conditions on the credible interval, but adds a variance term.

Prior (for replication and after first study): N(z,2)
(Pseudo) Data: N(0,1)
Posterior: N(z,3)

Posterior Predictive Distribution: N(z,\sqrt{3}) by rule of addition of normally distributed random variables then standardizing.

See also:

Cross Validated: Prediction Interval = Credible Interval? – The first answer has a good answer that explains the distinction in Bayesian analysis.

2 Likes

In your post 169 and various other posts (even as recent as 205!) you defined z_\text{repl} = \frac{b_\text{repl}}{s/\sqrt{n_2}} = b_\text{repl}/\sqrt{v_2}. This makes sense; it’s the z-statistic of the replication study which is usually taken to determine “replication success” when it exceeds 1.96.

Now, all of a sudden, you claim that this definition is “my interpretation” and change the definition to z_\text{repl} = b_\text{repl}/\sqrt{v_1 + v_2}.

Despite the fact that b_\text{repl} \sim N(\beta,v_2), you claim that \sqrt{v_1+v_2} is actually the “total standard error reflecting all sources of uncertainty.” Ridiculous.

As you know, z-statistics are commonly understood to have the standard normal distribution when there is no effect, i.e. \beta=0. Your newly defined “z-statistic” does not have this defining property.

Sorry – this is total BS. What a waste of time.