You have to show me your proof first of course.
He already did this, twice. Your result conflicts with known facts from the algebra of random variables.
When this is not convenient, you appear resort to the motte and bailey fallacy by pointing out âclosenessâ to a certain empirical results.
This is not valid mathematical reasoning, although sometimes it may help in developing a proof.
You canât call what is at best your conjecture, a theorem without showing how to derive it from known mathematical facts, that will hold for arbitrary values.
The following example is taken from:
Hamming, R. W. (2012). Methods of mathematics applied to calculus, probability, and statistics. Dover Publications.
Hamming gives the following example as a hypothetical conjecture asserting a universal formula to compute prime numbers:
f(n) = n^2 - n + 41
A few moments of thought should show that, while this will hold for n \in \mathbb{Z} & (n: 0 \le n \le 40), it will fail when n = 41.
I have reviewed my logic and agree with you that my equivalent statement was wrong. The correct equivalent statement should not have been
P(b_\text{repl} \mid b,s,n_1,n_2)âźN(b, \frac{s^2}{n_1}+\frac{s^2}{n_2})
but
P(b_\text{repl} \mid b,s,n_1,n_2)\sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}},\ 1 \right)
This expression now leads logically on to :
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right).
I had arrived at the expression immediately above by using the same reasoning process but had not used your notation until you asked me to do so. I made a mistake while arriving at the incorrect statement by working retrospectively so to speak. However the above final expression remains the same and all the results of applying it to data remain the same of course.
I have reviewed my logic
Better late than never, eh?
The original statement
b_\text{repl} \mid b,s,n_1,n_2 \sim N(b, \frac{s^2}{n_1}+\frac{s^2}{n_2})
was derived from (compatible with) an assumed model where \beta has the uniform distribution while
b \mid \beta,s,n_1 \sim N(\beta,s^2/n_1) \quad \text{and} \quad b_\text{repl} \mid \beta,s,n_2 \sim N(\beta,s^2/n_2)
The new statement (Iâm assuming v_i=s^2/n_i)
b_\text{repl} \mid b,s,n_1,n_2) \sim N \left( \frac{b}{\sqrt{v_1 + v_2}},\ 1 \right)
comes out of nowhere. Note that itâs even impossible if v_2>1. Just for fun, try to think of a prior distribution for \beta that would lead to this. Hint: I donât think you can, but if you could it would have to depend on v_2 which obviously disqualifies it as a prior! The uniform prior for \beta certainly doesnât work.
PS
This expression now leads logically on to :
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right).
Not quite. It leads to the following
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2)
=P(b_\text{repl} > 1.96\, s/\sqrt{n_2} \mid b,s,n_1,n_2)
=P\left(b_\text{repl} > 1.96\, \frac{s}{\sqrt{n_2}} - \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}}\mid b,s,n_1,n_2 \right)
=\Phi \left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\, \frac{s}{\sqrt{n_2}} \right)
High school math is hard!
=\Phi \left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\, \frac{s}{\sqrt{n_2}} \right)
I have to keep working backwards to try to satisfy you demands!
Try replacing::
b_\text{repl} \mid b, s, n_1, n_2 \sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}}, 1 \right)
with:
z_\text{repl} = \frac{b_\text{repl}}{s / \sqrt{n_2}} \mid b, s, n_1, n_2 \sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}}, 1 \right)
to see if your high school maths leads you only to:
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right).
z_\text{repl} = \frac{b_\text{repl}}{s / \sqrt{n_2}} \mid b, s, n_1, n_2 \sim N \left( \frac{b}{\sqrt{v_1 + v_2}}, 1 \right)
is of course the same as
b_\text{repl} \mid b, s, n_1, n_2 \sim N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, v_2 \right).
This conditional distribution of b_\text{repl} is the convolution of the posterior distribution of \beta after the first study, and N(0,v_2). This implies that the posterior distribution of \beta after the first study is N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, 0 \right). In other words, after the first study we have absolute certainty that \beta is \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}. Do you understand that that is absurd?
It gets worse. To have posterior certainty, we must also have prior certainty. So your model implies that the prior for \beta is N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, 0 \right). This is not even a valid prior because it depends on b.
If our discussion up to this point is anything to go by, I expect youâll tell me that all this doesnât matter because you have a special philosophy, and you have years of experience as a doctor, and your âPredictive Probability of Replication Successâ is similar to my results with Goodman. I would answer:
- The model from my paper with Goodman is based on more than 40,000 trials which cannot be compared to the experience of a single doctor.
- Our model is coherent, so what does your incoherent model add to that?
- We present a method for developing (coherent) models which can be applied in other contexts, see for example Evidenced-Based Prior for Estimating the Treatment Effect of Phase III Randomized Trials in Oncology
As Iâve mentioned before, I donât get the impression that youâre really interested in my criticism. All you seem to care about is to defend your âPredictive Probability of Replication Successâ no matter what I say. I find that quite disappointing.
Of course I am interested because I am trying to understand why we get different results. I have been probing your thought process. It seems to me that you keep invoking another Bayesian prior on top of my flat prior. Please understand, that my expression:
P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b}{\sqrt{v_1 + v_2}} - 1.96 \right)
depends only on a flat prior and a consequent direct estimation of b_repl conditional on b by adding the variance from the completed study based on s and n1 to the estimated variance of the second study depending on the same s but a n2 that may or may not be different to n1.
Unbelievable! Iâve demonstrated so many times now that the flat prior on \beta implies
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2)= \Phi\left( \frac{b - 1.96\frac{s}{\sqrt{n_2}}}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} \right).
And as Iâve just demonstrated to you, to get your âPredictive Probability of Replication Successâ
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2) = \Phi\left( \frac{b}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} - 1.96\right)
you would have to assume that the prior on \beta is N \left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}}, 0 \right) which is invalid / makes no sense.
OK. Let me try again. Bear with me. My predictive probability of replication success is based on:
P(z_{\text{repl}} > 1.96 \mid b, s, n_1, n_2) = \Phi\left( \frac{b}{\sqrt{ \frac{s^2}{n_1} + \frac{s^2}{n_2} }} - 1.96 \right)
This is in turn based on the predictive distribution:
b_{\text{repl}} \mid b \sim \mathcal{N}\left( b,\ \frac{s^2}{n_1} + \frac{s^2}{n_2} \right)
A Bayesian might argue that for this predictive distribution to hold, one must have assumed a posterior distribution of the form:
\beta \mid b \sim \mathcal{N}\left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}},\ 0 \right)
This implies complete certainty about the value of \beta , and that this posterior mean reflects a form of shrinkage. I think that this criticism is incorrect because I assume that:
-
A flat (non-informative) prior on \beta , i.e. \pi(\beta) \propto 1 . This yields the posterior:
\beta \mid b \sim \mathcal{N}(b, v_1)- The replication estimate, conditional on the posterior, is:
b_{\text{repl}} \mid b \sim \mathcal{N}(b,\ v_1 + v_2)
This results from convolving:
\beta \mid b \sim \mathcal{N}(b, v_1) \quad \text{and} \quad b_{\text{repl}} \mid \beta \sim \mathcal{N}(\beta, v_2) - The replication estimate, conditional on the posterior, is:
The mean remains at b and the variance is (v_1 + v_2), reflecting uncertainty. A degenerate distribution (zero variance) would imply complete certainty, which contradicts the observed data-based uncertainty in b .
I think that you may be incorrectly treating the predictive mean as a shrinkage-weighted posterior mean, which would require a prior dependent on the data:
\beta \sim \mathcal{N}\left( \frac{b \sqrt{v_2}}{\sqrt{v_1 + v_2}},\ 0 \right)
Such a prior is invalid because as you say, it depends on the observed data b , violating the rules of Bayesian inference.
So my expression does not assume a degenerate or data-dependent prior. It simply uses a flat prior, yielding a posterior of \mathcal{N}(b, v_1) , and a predictive distribution for b_{\text{repl}} of \mathcal{N}(b, v_1 + v_2) . So I think that perhaps your criticism arises from incorrectly imposing shrinkage logic and misinterpreting the predictive distributionâs mean as evidence of an informative prior.
NO, IT IS NOT. For the umpteenth time:
b_{\text{repl}} \mid b \sim \mathcal{N}\left( b,\ \frac{s^2}{n_1} + \frac{s^2}{n_2} \right)
implies (is even equivalent to)
P(z_\text{repl} > 1.96 \mid b,s,n_1,n_2)= \Phi\left( \frac{b - 1.96\frac{s}{\sqrt{n_2}}}{\sqrt{\frac{s^2}{n_1} + \frac{s^2}{n_2}}} \right).
I understand that you are a retired physician with an interest in probability and statistics. Good for you! However, you are evidently not very skilled as you continue to struggle with what is actually a very simple model. So letâs make a deal. You donât write papers about statistical methodology and when I retire, I wonât start treating patients.
I am a retired physician, medical scientist and clinical teacher with a continued interest in seeing medical data interpreted well. I am very concerned about the continuing replication crisis and misinterpretation of diagnostic tests, especially new ones with exaggerated claims. Both these problems are continuing to cause waste and harm. I would like to see improved mutual understanding between statisticians and people like me from exchange of ideas through dialogue and publications, including on DataMethods. I will continue to do so.
OK, but if you want to improve your understanding of statistics then you need to listen to me when I keep telling you the same thing over and over again.
Exactly. That sums up your approach. All you have done is to repeat assertions âover and over againâ without explaining your reasoning in a calm and measured way, leaving me to try to guess what you were thinking. It is not what I am used to, especially on this site, which supports helpful dialog. It would have been helpful if you had used my post 209 to clarify matters step by step but you chose not to.
There are a number of gaps in your thinking, which is why I cannot make heads or tails of what your assumptions are.
- You seem to believe that even though your first estimate came from a normal distribution, you need to assume a uniform prior before the second experiment.
- You confuse what is assumed or known vs. what is computed. We donât assume posterior distributions (in typical cases); posteriors are computed from a combination of prior \times likelihood. Assumptions come in the form of a prior distribution and a data generation model before the data are seen. The order in which things are known makes a difference that isnât explicitly reflected in elementary algebraic manipulations of statistical quantities. If you take care to order your computations correctly, elementary algebra will still yield correct results.
If you are going to use an improper uniform distribution, you need to assert it as part of your model for the first experiment. After that first experiment, the computed Bayesian posterior does indeed act as an informative prior for the replication.
That is why many find Bayesian analysis a beautiful model for learning from data.
There is no âshrinkageâ in the sense of pulling the observed mean towards zero, as EvZ pointed out to me. But the Bayesian result does add a term to the variance, compared to a naive estimate of b_{obs} = SNR, so this does reduce replication probabilities when we convert back to the probability scale.
If you donât compute your prediction from the observed initial estimate that accounts for uncertainty, what exactly are you assuming?
That is very unfair, because Iâve spent so much time writing all the steps out in great detail. I just hope there are at least some followers who do find this helpful.
The first assertion of your post 209 is that
b_{\text{repl}} \mid b \sim \mathcal{N}\left( b,\ v_1 + v_2 \right)
implies that
P(z_{\text{repl}} > 1.96 \mid b, v_1, v_2) = \Phi\left( \frac{b}{\sqrt{ v_1+v_2 }} - 1.96 \right)
This is false as Iâve demonstrated many times. Iâll demonstrate it one more time, but now Iâll number the steps so that you can point out to me where you disagree. Iâll follow your post 209 as closely as I can.
-
Assume the flat prior on \beta . Since b \mid \beta,v_1 \sim N(\beta, v_1), it follows that
\beta \mid b, v_1 \sim N(b, v_1). -
Since b_{\text{repl}} \mid \beta,v_2 \sim N(\beta, v_2) it follows that
b_{\text{repl}} \mid b,v_1,v_2 \sim N(b, v_1 + v_2) -
Since z_\text{repl} = b_\text{repl}/\sqrt{v_2} it follows that
P(z_\text{repl} > 1.96 \mid b,v_1,v_2) =P(b_\text{repl} > 1.96\, \sqrt{v_2} \mid b,v_1,v_2) -
Standardize the (conditional) distribution of b_\text{repl} by first subtracting the (conditional) mean (which is b) and then dividing by the (conditional) standard deviation (which is \sqrt{v_1+v_2}).
P(b_\text{repl} > 1.96\, \sqrt{v_2} \mid b,v_1,v_2) = P\left(\frac{b_\text{repl} - b}{\sqrt{v_1+v_2}} > \frac{1.96\, \sqrt{v_2} - b}{\sqrt{v_1+v_2}} \mid b,v_1,v_2\right) -
Conditionally on b,v_1 and v_2, (b_\text{repl} - b)/\sqrt{v_1+v_2} has the standard normal distribution. So, we conclude that
P(z_\text{repl} > 1.96 \mid b,v_1,v_2) = \Phi\left(\frac{b - 1.96\, \sqrt{v_2}}{\sqrt{v_1+v_2}} \right) -
We conclude that the flat prior on \beta does not imply
P(z_{\text{repl}} > 1.96 \mid b, v_1, v_2) = \Phi\left( \frac{b}{\sqrt{ v_1+ v_2}} - 1.96 \right)
The problems appear to stem from this post:
- After the first study, he fails to condition on the observed mean, and introduces a new variable m
- In the derivation of m, he confuses his measurement scale (which has sample sizes as part of the calculation) with the standardized statistical scale. By substituting 1 for n, his weight for the z score drops out of the equation, which isnât correct algebra in the general case.
Thank you. Thatâs clear.
So to recap and then to follow my train of thought:
I define the predictive probability of replication success as:
P(z_{\text{repl}} > 1.96 \mid b, s, n_1, n_2) = \Phi\left( \frac{b}{\sqrt{ \frac{s^2}{n_1} + \frac{s^2}{n_2} }} - 1.96 \right)
This is derived using:
- A flat prior on \beta , so that \beta \mid b \sim \mathcal{N}(b, v_1) , where v_1 = s^2 / n_1
- The sampling model for replication: b_{\text{repl}} \mid \beta \sim \mathcal{N}(\beta, v_2) , where v_2 = s^2 / n_2
The resulting predictive distribution is:
b_{\text{repl}} \mid b \sim \mathcal{N}(b, v_1 + v_2)
Standardizing with the replication statistic as:
z_{\text{repl}} = \frac{b_{\text{repl}}}{\sqrt{v_1 + v_2}} \sim \mathcal{N} \left( \frac{b}{\sqrt{v_1 + v_2}},\ 1 \right)
The probability of replication success is:
P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b}{\sqrt{v_1 + v_2}} - 1.96 \right)
However, your interpretation defines the replication z-statistic as:
z_{\text{repl}} = \frac{b_{\text{repl}}}{\sqrt{v_2}}
Then:
P(z_{\text{repl}} > 1.96 \mid b) = P\left( b_{\text{repl}} > 1.96 \sqrt{v_2} \mid b \right)
Given:
b_{\text{repl}} \mid b \sim \mathcal{N}(b, v_1 + v_2)
you standardize:
P\left( b_{\text{repl}} > 1.96 \sqrt{v_2} \right) = P\left( \frac{b_{\text{repl}} - b}{\sqrt{v_1 + v_2}} > \frac{1.96 \sqrt{v_2} - b}{\sqrt{v_1 + v_2}} \right) = \Phi\left( \frac{b - 1.96 \sqrt{v_2}}{\sqrt{v_1 + v_2}} \right)
All this suggests that both expressions are mathematically valid, but they are answering different questions:
-
My approach defines the replication test statistic using the total standard error \sqrt{v_1 + v_2} , reflecting all sources of uncertainty. This yields:
P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b}{\sqrt{v_1 + v_2}} - 1.96 \right)
-
Your approach defines z_{\text{repl}} as the replication statistic using the replication variance v_2 , which gives:
P(z_{\text{repl}} > 1.96) = \Phi\left( \frac{b - 1.96 \sqrt{v_2}}{\sqrt{v_1 + v_2}} \right)
So, the flat prior does lead to my expression, provided the z-statistic is defined in terms of the total uncertainty (as I intend). Your formulation assumes a different test definition but does not invalidate the logic behind my approach.
Thank you. Perhaps my post 217 in response to @EvZ answers you points.
Wrong. Your formulation does not account for prior uncertainty, as you are missing a term on what I call the measurement scale, that incorporates sample sizes for the prior.
You missed this with the insistence on deriving things on the measurement scale.
The way i understand EvZâs R code for the Bayesian credible interval for the effect is by thinking of the Bayesian computation as a data augmented combination of standardized Z scores. Sander Greenland describes using data augmentation to perform Bayesian computations with frequentist software in this post:
In this setup, on the standardized scale, information is represented as a shift from our sampling model N(0,1) in units known as probits. The uniform, improper prior amounts to assuming a previously conducted study with a result N(0,1) Probits are additive, and our normality assumption allows us to treat our prior and observed data as normally distributed random variables.
By the rule of summation of random variables, our Bayesian posterior after the first study is the sum of 2 normal random variables N(\theta, \sigma^2)
Prior: N(0,1)
Data: N(z,1)
Posterior: N(z,2)
To get back to the standardized normal distribution scale with a variance of 1, we have to divide by the square root of the variance. So our credible interval after the first study is:
Standardized scale: N(z,\sqrt{2})
Our Bayesian prediction interval on the standard normal scale, assuming a uniform prior, conditions on the credible interval, but adds a variance term.
Prior (for replication and after first study): N(z,2)
(Pseudo) Data: N(0,1)
Posterior: N(z,3)
Posterior Predictive Distribution: N(z,\sqrt{3}) by rule of addition of normally distributed random variables then standardizing.
See also:
Cross Validated: Prediction Interval = Credible Interval? â The first answer has a good answer that explains the distinction in Bayesian analysis.
In your post 169 and various other posts (even as recent as 205!) you defined z_\text{repl} = \frac{b_\text{repl}}{s/\sqrt{n_2}} = b_\text{repl}/\sqrt{v_2}. This makes sense; itâs the z-statistic of the replication study which is usually taken to determine âreplication successâ when it exceeds 1.96.
Now, all of a sudden, you claim that this definition is âmy interpretationâ and change the definition to z_\text{repl} = b_\text{repl}/\sqrt{v_1 + v_2}.
Despite the fact that b_\text{repl} \sim N(\beta,v_2), you claim that \sqrt{v_1+v_2} is actually the âtotal standard error reflecting all sources of uncertainty.â Ridiculous.
As you know, z-statistics are commonly understood to have the standard normal distribution when there is no effect, i.e. \beta=0. Your newly defined âz-statisticâ does not have this defining property.
Sorry â this is total BS. What a waste of time.