Some thoughts on uniform prior probabilities when estimating P values and confidence intervals

HuwLlewelyn · June 24, 2025, 10:50am

Yes. But instead of b_repl, I an interested in the possible outcomes of all studies with the same s but n2 (which can be and usually is the same as n1 in replication studies when the planned study has the same sample size, but can be very different (e.g. infinity theoretically) conditional on all the possible values of beta. How does your b_repl relate to beta?

R_cubed · June 24, 2025, 1:39pm

@HuwLlewelyn: It looks like you are trying to derive the formula for a prediction interval. Study the following and tell me if this is helpful.

EvZ · June 24, 2025, 1:53pm

Just to be sure, do we agree on the following set-up?

We start with beta which is the unkown true effect. We have two studies; the “original” and the “replication” that both target beta. Let’s assume they have the same sample size n and standard deviation s. Suppose they yield estimates b and b_repl which are unbiased with the same standard error se=s/sqrt(n). In other words, conditionally on beta, b and b_repl are independent, normally distributed with mean beta and standard deviation se.

You asked:

How does your b_repl relate to beta?

As I just wrote, conditionally on beta, b and b_repl are independent, normally distributed with mean beta and standard deviation se.

Now, in an actual field of research (such as the clinical trials in the Cochrane database) there is an association between the true effects (beta’s) and the standard errors (se’s). This is due to the practice of sample size calculations. For that reason, I prefer to divide everything by se and work with SNR=beta/se, z=b/se and z_repl=b_repl/se instead of beta, b and b_repl. If you find this confusing, you can also assume that se is always 1. Is that OK with you?

HuwLlewelyn · June 24, 2025, 2:17pm

I’m not sure where you are going with this, but se = 1 when n belongs to the set of all n when n = s^2.

HuwLlewelyn · June 24, 2025, 2:20pm

Yes in principle, but this discussion so far focuses on the interval z_repln > 1.96.

EvZ · June 24, 2025, 2:37pm

I’m not sure where you are going with this

We’ll get there!

We start with beta which is the unkown true effect. We have two studies; the “original” and the “replication” that both target beta. Let’s assume they have the same sample size n and standard deviation s. Suppose they yield estimates b and b_repl which are unbiased with the same standard error se=s/sqrt(n). In other words, conditionally on beta, b and b_repl are independent, normally distributed with mean beta and standard deviation se.

Now define SNR=beta/se, z=b/se and z_repl=b_repl/se. It follows that conditionally on SNR, z and z_repl are independent, normally distributed with mean SNR and standard deviation 1.

All agreed?

HuwLlewelyn · June 24, 2025, 3:19pm

OK. So what are the distributions of b and b_repl conditional on beta?

EvZ · June 24, 2025, 3:21pm

I’ve said a few times: Conditionally on beta, b and b_repl are independent, normally distributed with mean beta and standard deviation se.

Also, conditionally on SNR, z and z_repl are independent, normally distributed with mean SNR and standard deviation 1.

What’s not clear?

HuwLlewelyn · June 24, 2025, 3:28pm

I thought that you were about to give some example values, e.g. b=2, se = 1, SNR, etc. as you suggested using se = 1 in your numerical examples. Carry on with numerical examples please that make it easier for mw to understand differences from my approach.

EvZ · June 24, 2025, 4:08pm

We now agree on the following set up: We start with beta which is the unkown true effect. We have two studies; the “original” and the “replication” that both target beta. Let’s assume they have the same sample size n and standard deviation s. Suppose they yield estimates b and b_repl which are unbiased with the same standard error se=s/sqrt(n). In other words, conditionally on beta, b and b_repl are independent, normally distributed with mean beta and standard deviation se. Now define SNR=beta/se, z=b/se and z_repl=b_repl/se. It follows that conditionally on SNR, z and z_repl are independent, normally distributed with mean SNR and standard deviation 1.

We are interested in the conditional probability of a statistically significant replication, given the result of the first study. So, that’s P(z_repl > 1.96 | b,se).

If I understand correctly, you claim that if we assume the (improper) uniform (or “flat”) prior for beta, then the conditional distribution of z_repl given b and se is normal with mean z/sqrt(2) and standard deviation 1? In other words

z_repl | b,se ~ N(z/sqrt(2),1).

Note that this conditional distribution depends on b and se only though z.

Check: If I use R to calulate P(z_repl>1.96 | z) according to your formula for a few values of z, then I get your numerical results:

z=c(0.67,1.04,1.64,1.96,2.17,2.58,2.81,3.29)
1 - pnorm(1.96,z/sqrt(2),1)

0.07 0.11 0.21 0.28 0.34 0.45 0.51 0.64

So, is this indeed what you claim?

HuwLlewelyn · June 24, 2025, 4:26pm

No. For z=1.96 that expression, if I read correctly as 1-pnorm(1.96-1.96)/sqrt(2), 1), gives 0.5, not 0.28 (sorry. I inserted 2 instead of 1.96 the first time to get 0.5113).

EvZ · June 24, 2025, 4:47pm

1 - pnorm(1.96,1.99/sqrt(2),1) = 0.29.

or, if you prefer,

1 - pnorm(1.96 -1.99/sqrt(2),0,1) = 0.29.

Make sure you mind the brackets!

HuwLlewelyn · June 24, 2025, 5:03pm

But 1-pnorm(1.96-1.96)/sqrt(2)) = 0.5 and 1-pnorm(1.96-1.99)/sqrt(2) = 0.508 and 1-pnorm(1.96-2)/sqrt(2) = 0.511. However, pnorm(1.96/sqrt(2) - 1.96) = 0.283

EvZ · June 24, 2025, 5:07pm

But 1-pnorm(1.96-1.96/sqrt(2)) = 0.5

No,

1-pnorm(1.96-1.96/sqrt(2)) = pnorm(1.96/sqrt(2) - 1.96) = 0.28

This is silly.

HuwLlewelyn · June 24, 2025, 6:40pm

Yes it is what I claim. However, what I am used to seeing is p(z_repln>1.96|z) = pnorm((1.96-z)/sqrt(2)). So I’m sorry that I missed the subtle difference where you replaced pnorm((1.96-z)/sqrt(2)) with 1-pnorm(1.96-z/sqrt(2)). As you say, this gives the same result as my version of pnorm(1.96/sqrt(2) - 1.96), which is a simplified version of p(z_repln>1.96 | b, s, n1, n2) = pnorm(b/sqrt(n1) +s/sqrt(n))-1.96). Your modification and results in the quote gives the same results as those shown in my Table 1 earlier (that appeared to surprise you by corresponding to the result of your 2022 paper with Goodman when I first showed it). So it seems at last that we are in agreement with what I had postulated in my pre-print

My main expression (in notation this time again for clarity) is
p(z_repln>1.96) | b, s, n1, n2) = Φ(b/√(s/√n1+s/√n2)-1.96).
This allows the sample size of the planned replicating study to be varied in a ‘what if’ way. By inserting a very large number as n2, we can test @Stephen’s reply in 2002 to Stephen Goodman’s paper of 1992. It also implies that in addition to @Stephen’s suggestion about postulating an infinitely large n2, the replication crisis could be resolved by doubling the sample sizes suggested by current power calculations.

PS. This might stop the FDA for asking for two positive trial results.

EvZ · June 24, 2025, 9:16pm

Yes it is what I claim.

Excellent! We’re making progress! We already agreed on the following set up:

We start with beta which is the unkown true effect. We have two studies; the “original” and the “replication” that both target beta. Let’s assume they have the same sample size n and standard deviation s. Suppose they yield estimates b and b_repl which are unbiased with the same standard error se=s/sqrt(n). In other words, conditionally on beta, b and b_repl are independent, normally distributed with mean beta and standard deviation se. Now define SNR=beta/se, z=b/se and z_repl=b_repl/se. It follows that conditionally on SNR, z and z_repl are independent, normally distributed with mean SNR and standard deviation 1. We are interested in the conditional probability of a statistically significant replication, given the result of the first study. So, that’s P(z_repl > 1.96 | b,se).

Within this set-up, we have now also established Llewelyn’s claim:

If we assume the (improper) uniform (or “flat”) prior for beta, then
z_repl | b,se ~ N(z/sqrt(2),1).

Now, there are several reasons why I don’t agree with this claim. The first reason is relatively minor. As I explained, the standard error se has information about beta due to the practice of sample size calculations. To account for this, we would need a joint prior for beta and se. However, we can finesse this difficulty by dividing everything by se. So, we arrive at Llewelyn’s modified claim:

If we assume the (improper) uniform (or “flat”) prior for SNR, then
z_repl | z ~ N(z/sqrt(2),1).

Is this still a fair representation of your claim?

HuwLlewelyn · June 24, 2025, 9:50pm

I need to reflect on this during the day and to understand better where you are going with it. Is this expression derived from Goodman’s 1-pnorm((z*-z)/sqrt(2)) or the latest 1-pnorm(z*-z/sqrt(2))?

EvZ · June 26, 2025, 3:06pm

Llewelyn’s modified claim: If we assume the (improper) uniform (or “flat”) prior for SNR, then
z_repl | z ~ N(z/sqrt(2),1).

This would imply

P(z_repl > 1.96 | z) = 1 - pnorm(1.96 - z/sqrt(2),0,1)

and in particular

P(z_repl > 1.96 | z=1.96) = 1 - pnorm(1.96 - 1.96/sqrt(2),0,1) = 0.28.

HuwLlewelyn · June 26, 2025, 6:53pm

Sorry for the delay in responding. I have been reflecting in much detail about the difference between Goodman’s expression and the one used by me and trying to tease out the various implications.

Firstly, the simplified version of my expression is not
P(z_repl > 1.96 | z) = 1 - pnorm(1.96 - z/sqrt(2)) when z = b/se
but my simplified version is
P(z_repl > 1.96 | b, se) = 1 - pnorm(1.96 - b/(sesqrt(2)))
or preferably
P(z_repl > 1.96 | b, se) = pnorm(b/(sesqrt(2)) - 1.96)
So in my case, z = b/(se*sqrt(2)) not z = b/se

Secondly, why do you state ‘modified claim’?

EvZ · June 26, 2025, 7:21pm

Firstly, the simplified version of my expression is not
P(z_repl > 1.96 | z) = 1 - pnorm(1.96 - z/sqrt(2))

Strange that you should say that, because the formula does give your numerical results:

z=c(0.67,1.04,1.64,1.96,2.17,2.58,2.81,3.29)
1 - pnorm(1.96 - z/sqrt(2),0,1)

0.07 0.11 0.21 0.28 0.34 0.45 0.51 0.64

In particular, the formula gives P(z_repl > 1.96 | z=1.96) = 0.28 as you have repeatedly claimed.

Now, you are now introducing a different formula:

P(z_repl > 1.96 | z) = 1 - pnorm(1.96 - b/sqrt(v1+v2),sqrt(2)) when v1=v2.

For example, suppose b=1.96, v1=1 and v2=1. Then the z-statistic of the first study is z=b/sqrt(v1)=1.96, but your new formula does not yield 0.28.

Secondly, why do you state ‘modified claim’?

Please read my comment 136.