BBR Session 4: Hypothesis Testing, Branches of Statistics, P-values

Hi Aron. I still think that bringing pre-study design characteristics into the interpretation is not that helpful. As an aside, what (little) meaning that confidence intervals have they have independent of those pre-study parameters.

I don’t think the idea of Bayesian “rescues” is very strong from the standpoint of scientific rigor, unless possibly when a very skeptical prior distribution still results in a meaningful posterior distribution for the question at hand. We should plan Bayesian analyses prospectively as either the primary analysis or a planned parallel secondary analysis as we did for the recently reported ISCHEMIA study.

Thank you again, Frank, good point about the CI-s.

And thank you for pointing me to the ISCHEMIA study, it was fascinating to read the published methods paper and then see the results presentation. As far as I understand, if a Bayesian parallel analysis had not been performed the studies conclusion ("… did not reduce the overall rate…") could also be called into question, ie for not stating “our study could not reject the null hypothesis”. And since virtually flat priors were used I guess a similar Bayesian analysis could also be performed for all the trials that “overclaim” and in most cases would lead to the same conclusion.

My point is that even though it is statistically incorrect to conclude anything other than “the money was spent” from non-significant p-values, physicians are still likely to draw the clinically likely correct conclusions that the treatment effect is probably small, if existent. Maybe the population of non-significant trails is not the most suited for demonstrating the advantages of Bayesian statistics.

ps: When reading the section on primary outcome your message about using time-to-event instead of binary outcomes resonated well :slight_smile:

Not quite, otherwise you could just use N=2 for clinical trials. Abandon the p-value. If not using Bayes then emphasize the compatibility intervals.

Hi Frank & others,
Just want to clarify some thoughts. In the section talking about confidence Intervals (or compatible intervals), it was mentioned that if the compatibility interval included a large positive effect and a large negative effect, then you can really not make any conclusion about the treatment effect (not in the exact wordings as I remembered it, but that’s the impression I got). What if the compatibility interval is larger on one side but included the estimate of no effect? For example, for a blood pressure treatment A, say the compatibility interval was consistent with an increase of systolic blood pressure as large as 2mmHg, as well as consistent with a decrease as large as 30mmHg. One would tend to interpret this as a useful treatment (if no other options are available). I tried to choose words carefully as to separate statistical statement from clinical interpretation. Would this kind of thinking be reasonable? Or the interpretation should be the same as the example in the video that we still can’t really make any conclusions about the treatment?
Many thanks.

Good question, and I hope that others weigh in. There is a danger in converting compatibility interval thinking to p-value thinking, i.e. in looking hard at whether an interval includes zero. But to your question I’d say that the data are incompatible with a large harm and not say much about either small or large benefit.

Yes, that makes sense - interpret the obvious and not over-interpreting it. Thank you.

Catching up on these lectures. Thanks again @f2harrell they are such a delight to watch.
Could I ask for follow up information on your multiplicity discussion. You describe it in terms of “backward time ordering of information” do you have any pointers to where I can read more on this view of multiplicity?

The only thing I can think of right now is this.