Some thoughts on uniform prior probabilities when estimating P values and confidence intervals

HuwLlewelyn · July 4, 2025, 10:18pm

I stated that my approach defines the replication test statistic by using the total standard error \sqrt{v_1 + v_2} , reflecting all sources of uncertainty, so it was a a part of my assumptions, not a claim. You do not appear to have read what I wrote carefully or are misrepresenting me in error.

Again, you also appear to misunderstand something else - the purpose of my statistic this time. It is not intended as a hypothesis test under β=0, but as a predictive standardization under observed data.

I have set out my assumptions, reasoning and conclusions as you asked me to do after finally understanding your objections. Your response has been interesting.

HuwLlewelyn · July 4, 2025, 10:25pm

I have set out my assumptions, reasoning and conclusions in post 217. The mathematics is sound. You are entitled to disagree with my assumptions. However, I suggest that you read post 175 when I apply the resulting expression to data, where it succeeds in predicting frequncies of replication very well.

R_cubed · July 4, 2025, 10:27pm

Your mathematics are not sound, as your derivation of the variance for the prediction of study 2 is missing a term, which is what EvZ (a mathematician) is trying to tell you. Repeating falsehood does not make something mathematically true.

You committed an algebraic error by confusing the standardized Z score with your measurement score, that overlaps with it; the assignment of n=1 causes the variance term to drop out in this particular case but that isn’t valid in general. What if n=5?

We have demonstrated 2 ways that you are in error: via EwZ’s proof by contradiction (you result conflicts with known mathematical facts), and my alternative calculation using known facts about standardized Z scores and their algebraic combination, which is used in meta-analysis.

HuwLlewelyn · July 4, 2025, 10:44pm

I suggest that you read post 175 when I apply the expression resulting from my assumptions and reasoning to data, where it succeeds in predicting frequencies of replication perfectly well in contrast to the alterative approach favoured by you and @EvZ.

R_cubed · July 4, 2025, 10:48pm

Your “total standard error” has only 2 variance terms. How do you distinguish the frequentist computation from the Bayesian one? Where did your prior variance for the Bayesian computation go?

Both of us understand perfectly well what you are trying to do; but it conflicts with known results in probability, as you tried to formulate it mathematically.

I rebutted this “argument” in 204:

:

EvZ · July 5, 2025, 7:48am

Surely you can understand my exasperation at you changing the definition of z_\text{repl} between posts 205 and 217. However, I do think it helps me to finally understand your reasoning.

Within the context of my post 215, I think we both agree on these two statements:

b_\text{repl} \mid \beta,v_2 \sim N(\beta,v_2)
b_\text{repl} \mid b,v_1,v_2 \sim N(b,v_1+v_2)

It follows that

b_\text{repl}/\sqrt{v_2} \mid \beta,v_2 \sim N(\beta,1)
b_\text{repl}/\sqrt{v_1+v_2} \mid b,v_1,v_2 \sim N(b,1)

Therefore, we have a valid level 5% test of the null hypothesis H_0 : \beta=0 versus the alternative A : \beta \neq 0 if we reject when |b_\text{repl}/\sqrt{v_2}|>1.96.

Similarly, we have a valid level 5% test of the null hypothesis H_0 : b=0 versus the alternative A : b \neq 0 if we reject when |b_\text{repl}/\sqrt{v_1+v_2}|>1.96.

Now, it seems to me that you are confused about the null hypothesis of the second test, maybe thinking that it’s testing H_0 : \beta=0 versus A : \beta \neq 0. And if you don’t want to test anything, then why are you interested in the event |b_\text{repl}/\sqrt{v_1+v_2}|>1.96? What does it have to do with replication?

Of course testing H_0 : b=0 versus A : b \neq 0 is not useful because in practice we observe b.

EvZ · July 5, 2025, 7:52am

You are predicting frequencies of the event |b_\text{repl}/\sqrt{v_1+v_2}| > 1.96 while we are predicting frequncies of the different event |b_\text{repl}/\sqrt{v_2}| > 1.96. So the agreement is not as convincing as you think.

HuwLlewelyn · July 6, 2025, 9:57am

My expression states that
(1) P(Z_{\text{repln}} > 1.96 \mid b, s, n_1, n_2) = \Phi\left( \frac{b}{\sqrt{\left( \frac{s}{\sqrt{n_1}} \right)^2 + \left( \frac{s}{\sqrt{n_2}} \right)^2}} - 1.96 \right)

when P(Z_{\text{repln}} > 1.96) corresponds to P ≤ 0.05 two sided.

However your expression really states that

P(z_\text{repl} > 1.96\frac{s}{\sqrt{n_2}} \mid b,s,n_1,n_2) = \Phi\left( \frac{b - 1.96\frac{s}{\sqrt{n_2}}}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} \right).

so that, for example, when s = 10 and n2 = 200, we have P(Z_{\text{repln}} > 1.386 that corresponds to P ≤ 0.166 two sided (compared to mine of P ≤ 0.05 two sided).

So do you think that our two expressions do not agree partly because we estimate the probability of getting different P values in the second replicating study?

I also note that Goodman’s 1992 expression uses 1.96\frac{s}{\sqrt{n_1}} instead of your 1.96\frac{s}{\sqrt{n_2}} This means that unless n1 = n2, then the two expressions give different results partly because they estimate replication for different P values. Their estimation of the variance is also different when n1 ≠ n2 and therefore 1.96\frac{s}{\sqrt{n_1}} ≠ 1.96\frac{s}{\sqrt{n_2}}. Do you agree?

EvZ · July 6, 2025, 11:38am

Within the context of my post 215 (especially assuming the uniform prior on \beta, I think we both agree on these two statements:

b_\text{repl} \mid \beta,v_2 \sim N(\beta,v_2)
b_\text{repl} \mid b,v_1,v_2 \sim N(b,v_1+v_2)

Since your post 217 there are two definitions of the z-statistic of the replication study in play. We need to distinguish them in the notation, so let’s define

z_\text{repl} = b_\text{repl} / \sqrt{v_2} \quad \text{and}\quad z^\text{Huw}_\text{repl} = b_\text{repl} / \sqrt{v_1+v_2}

Then

P(z_\text{repl} > 1.96 \mid b,v_1,v_2) = \Phi\left( \frac{b - 1.96\sqrt{v_2}}{\sqrt{v_1+v_2}} \right)

and

P(z^\text{Huw}_\text{repl} > 1.96 \mid b,v_1,v_2) = \Phi\left( \frac{b}{\sqrt{v_1+v_2}} - 1.96\right).

I think we can finally agree on the math. Yes?

Now, I do have some questions and comments about your interest in the event |z^\text{Huw}_\text{repl}|>1.96.

“Replication success” is commonly understood to mean |z_\text{repl}|>1.96. Are you proposing a different criterion, namely |z^\text{Huw}_\text{repl}|>1.96? And if so, why?
Your new criterion is unfamiliar to me. Is it your own invention, or do you have references to other work?
You didn’t mention your new criterion before your post 217. It is also not explicitly discussed in your pre-print. Failing to distinguish z^\text{Huw}_\text{repl} from z_\text{repl} is confusing, to say the least.
In your previous post you state that the event |z^\text{Huw}_\text{repl}|>1.96 corresponds to P<0.05 (two-sided). As you know, p-values are computed assuming that some null hypothesis is true. Which null hypothesis are you testing?

To be quite honest, I don’t see any reason why anyone (except you) should care about the event |z^\text{Huw}_\text{repl}|>1.96, let alone predict its probability on the basis of an earlier study.

HuwLlewelyn · July 6, 2025, 12:22pm

I am keeping to the generally accepted convention that the probability of replication (for Huw and everyone) is of getting a P ≤ 0.05 two sided again in a replicating study. Your expression moves the P value goal posts (e.g. in the above example to P ≤ 0.166 two sided again) so that you get an Erik’s higher probability of replication than in my expression based on a P ≤ 0.05 two sided again.

EvZ · July 6, 2025, 12:32pm

Getting P ≤ 0.05 (two sided) in a replicating study obviously means |z_\text{repl}|>1.96 and not |z^\text{Huw}_\text{repl}|>1.96. Are you seriously going to disagree with that?!

Also, please respond to my 4 comments/questions. In particular, indicate which null hypothesis you are testing when you claim that the event |z^\text{Huw}_\text{repl}|>1.96 corresponds to P ≤ 0.05 (two sided). You must specify the null hypothesis when you want to talk about p-values!

HuwLlewelyn · July 6, 2025, 12:40pm

Of course not.

For example my (Huw’s) expression gives (for P ≤ 0.05 two-sided):

p(Z_repl>1.96 | b=2, a= 10, n1 = 200, n2= 200) = 0.516.

However, your (Erik’s) expression gives (for P ≤ 0.166 two sided):

p(Z_repl>1.386 | b=2, a= 10, n1 = 200, n2= 200) = 0.730

EvZ · July 6, 2025, 12:50pm

You must distinguish the two definitions of the z-statistic of the replication study or I can’t be sure what you’re talking about. So please refer to z_\text{repl} and z^\text{Huw}_\text{repl} to make it clear what you mean.

Also, can you please confirm that you agree with the math I wrote in post 229? If we can agree on that, then we don’t need to go over any numerical examples.

Also, please respond to my 4 comments/questions. In particular, indicate which null hypothesis you are testing when you claim that the event |z^\text{Huw}_\text{repl}|>1.96 corresponds to P ≤ 0.05 (two sided). You must specify the null hypothesis when you want to talk about p-values!

HuwLlewelyn · July 6, 2025, 1:29pm

My Z statistic is b/{\sqrt{v1+v2}}

I have no issues with your maths, only your assumptions.

When the completed study is repeated, for each of the possible repeat study results, there will be a different ϐ_i but the same s and specified n2. Therefore for each of these possible replicating study results, there will be different individual P-values. In my expression, the probability of replication is equal to the proportion of these P-values that will be ≤ 0.05 two sided.

EvZ · July 6, 2025, 2:03pm

The first thing you learn in math is to use different symbols for different quantities. Your blunt refusal to use clear notation like z_\text{repl}=b_\text{repl}/\sqrt{v_2} and z^\text{Huw}_\text{repl}=b_\text{repl}/\sqrt{v_1+v_2} is the reason why after more than a hundred posts we still cannot agree on a few simple mathematical statements.

In fact, the whole discussion is about the fact that you are interested in the event |z^\text{Huw}_\text{repl}|>1.96 while everybody else is interested in |z_\text{repl}|>1.96.

I have no issues with your maths, only your assumptions.

We are assuming the uniform distribution on \beta and

b \mid \beta,v_1 \sim N(\beta,v_1) \quad \text{and}\quad b_\text{repl} \mid \beta,v_2 \sim N(\beta,v_2).

Moreover, b and b_\text{repl} are condtionally independent given \beta. These are all the assumptions. Which of them do you not agree with?

Therefore for each of these possible replicating study results, there will be different individual P-values.

I’ll just ignore any talk about p-values as long as you cannot specify the null hypothesis.

R_cubed · July 6, 2025, 2:29pm

You can’t compute any replication probabilities until you make explicitly clear what you are assuming is true (ie. conditioning on).

It looks like you want to condition on the null reference of 0, but wish to incorporate the prior variance of study 1, without conditioning on on its Z score. I have no idea why you would want to do that.

HuwLlewelyn · July 6, 2025, 2:35pm

I thought that I had made it clear, my z_\text{repl}=b_\text{repl}/\sqrt{v_1 +v_2}

and my |z^\text{Huw}_\text{repl}|>1.96 = |z_\text{repl}|>1.96.

I agree with them all.

For each possible repeat study result, there will be a different ϐ_i based on s and specified n1, and for each ϐ_i there will be a range of possible b{_(}{_r}{_e}{_p}{_l}{_)} {_i,}{_ j} with the same s and specified n2. For each of these possible b{_(}{_r}{_e}{_p}{_l}{_)} {_i,}{_ j} there will be a P-value each with a null hypothesis of zero. In my expression, the probability of replication is equal to the proportion of these P-values that will be ≤ 0.05 two sided.

EvZ · July 6, 2025, 3:31pm

Sorry, we can’t have a grown up discussion if you keep using “my” z_\text{repl} and “your” z_\text{repl }. Also, this discussion is a public resource, so we should make sure others can follow what we mean. So, the two definitions must be properly distinguished. Please use

z_\text{repl} = b_\text{repl} / \sqrt{v_2} \quad \text{and}\quad z^\text{Huw}_\text{repl} = b_\text{repl} / \sqrt{v_1+v_2}.

Sorry, this doesn’t make any sense at all. You’re claiming 1.96 > 1.96?

For each possible repeat study result

Yeah, yeah. I think we all know what a probability distribution is…

there will be a P-value each with a null hypothesis of zero .

But what is zero? Since you can’t or won’t be clear, let’s assume you mean H_0 : \beta=0 versus A : \beta \neq 0. Then, assuming this null hypothesis is true,

P(z_\text{repl} > 1.96 \mid \beta=0,b,v_1,v_2)
= P(z_\text{repl} > 1.96 \mid \beta=0)
= \Phi(-1.96)=0.025.

The first equality follows from the conditional independence of b and b_\text{repl} given \beta. The second equality follows from the fact that b_\text{repl} \mid \beta,v_2 \sim N(\beta,v_2). Notice how I’m using the assumptions we have agreed on to derive true statements.

Now, since z^\text{Huw}_\text{repl} = z_\text{repl} \frac{\sqrt{v_2}}{\sqrt{v_1+v_2}}

P(z^\text{Huw}_\text{repl} > 1.96 \mid \beta=0,b,v_1,v_2)
=P(z_\text{repl} \frac{\sqrt{v_2}}{\sqrt{v_1+v_2}} > 1.96 \mid \beta=0)
= P(z_\text{repl} > 1.96\frac{\sqrt{v_1+v_2}}{\sqrt{v_2}} \mid \beta=0)
=\Phi(-1.96\frac{\sqrt{v_1+v_2}}{\sqrt{v_2}}) \neq 0.025

The third equality follows from the fact that b_\text{repl} \mid \beta,v_2 \sim N(\beta,v_2). Again, notice how I’m using the assumptions we have agreed on to derive true statements.

R_cubed · July 6, 2025, 5:16pm

This sounds like you are confusing the probability of b_1 and b_{repel} under the reference hypothesis N(0,1) – ie. our sampling model – with your prior, which you specified as uniform.

HuwLlewelyn · July 6, 2025, 5:26pm

You are rescaling the distribution with sqrt(v2)/sqrt(v1+v2) which I have made very clear that I do NOT do. Without doing this P=0.025 one sided.

It is this rescaling in the van Zwet expression (Expr vZ) that produces different results to the Llewelyn Expression (Expr Ll). Goodman 1992 used n1 in his expression (let’s call it Expr G), whereas Expr vZ uses n2. They give different probabilities of replication results when n1 ≠ n2.

This rescaling that I do NOT do seems to me to be inappropriate and appears to be the source of a serious problem by changing the goal posts from P = 0.05 two sided to something inappropriate and also applying the wrong variance.