BBR Session 4: Hypothesis Testing, Branches of Statistics, P-values

This is a topic for questions, answers, and discussions about session 4 of the Biostatistics for Biomedical Research web course airing on 2019-11-01. Session topics are listed here.

1 Like

On page 5-5 (section 5.1.1) of the BRR notes one bullet point says:

if the data distribution is asymmetric, the SD is not independent of the mean (so the t distribution does not hold) and the SD is not a good dispersion measure

As far as I’m aware, the normal distribution is the only distribution where the mean is independent of the variance (see Lukacs (1942)). This fact even characterizes the normal distribution.

In the light of this I have a question:

  • What’s the relationship between symmetry, independence of mean and SD/variance and the adequateness of the SD as a dispersion measure?

Thank you.

Note: you’ll need to change your profile to provide your real first and last names in order to stay on the site.

Good question, and thanks for the point about the characterization of the normal distribution. I’ll answer the easy part. I don’t think the SD is a good dispersion measure for asymmetric distributions in general. I’d use Gini’s mean difference and quartiles.

Thanks for the answer. I agree that Gini’s mean difference is very appealing because it has a very intuitive interpretation.

Through the usual links, I can see a “Session 4”, but this only gives a download file option-- is there no longer replay through the private YouTube link?

Sorry about that. A couple of days ago your name didn’t pop up on my tablet; now I see it. I’ll edit these comments out shortly.

1 Like

I really wish I could have had this brief video on basic statistical inference years ago. I’m quite thankful for you sharing your expertise so liberally.

Edit: I will try to rephrase my question. If you still deem it off topic, I will delete and start another thread on it if you wish me to do so.

From the 19:00 - 20:40 in the video, you discuss the problems with 2-stage testing (ie. normality test prior to t-test), as an ineffective way to deal with departures from the assumption of normality. You point out that non-parametric tests are in most common scenarios more powerful than the naive, textbook versions of the t-test, and when they are not, this loss of efficiency is minor. This point is often missed in the literature I’ve come across.

You then state “If you really thought the data might come from a non-normal distribution, you should use the non-parametric test straight away.”

While I understand the logic behind the recommendation (I came to a similar conclusion after much study), other statisticians I’ve communicated with would disagree.

They would point out that while the naive, textbook versions of the t-test might not be best, robust variants (trimmed, Windsorized, weighted means, etc.) retain the advantages of the classical, parametric approach.

You later point out that the Bayesian version of the t-test is inherently robust, if one insists on a parametric point of view.

Ignoring Likelihood methods, this leads to 3 possible alternatives for the data analyst:

  1. Frequentist nonparametric methods
  2. Frequentist robust parametric methods
  3. Bayesian parametric methods

Are there techniques from Frequentist robust methods that you find worthwhile? Or do you essentially perform all of your parametric analyses within the Bayesian framework, and then use nonparametrics when you need to use the classical frequentist approach?

Your insight is always appreciated.

That seems to be a different topic. See if you want to open a new one.

Same issue here - I cannot open the .mkv file
thanks

VLC Media player is a free program that played it for me.

1 Like

Figured that out, but now to find someone with admin rights to my machine to download it
thanks!

After viewing the lecture video, I get the impression that the frequentist statistics do not answer the question that we want to ask or perhaps doing it in a roundabout way (through disproving the null rather than estimate the probability that the alternative is true). I also find the correct definition of p-values and confidence interval rather unintuitive (“in the long run”) as the data that we have are usually the only one that we have.

I can imagine that there must be other reasons out there that sustained the frequentist vs Bayesian debate, which made people still stayed in the frequentist camp. Could someone enlighten me on this?

You’ll see very long discussions about that on stats.stackexchange.com. My personal opinion about the reasons for people staying frequentist are:

  1. Stat courses teach mainly what the instructor thinks the students will need in the future, and most instructors were taught by the previous generation of instructors who felt the same way.
  2. Fisher had a huge PR machine, traveled all over the world, and had lots of students to spread the word.
  3. Bayes died before any of his work was known, and he had no students.
  4. Frequentists feel that their methods are more objective. Bayesians are certain that frequentist methods are more subjective.
  5. When frequentistism was on the rise, the computational tools needed to do Bayesian statistics were not available. Some frequentist methods can be done without a computer.
  6. Many statisticians are afraid of change just like other people. I feel there is a correlation between clinging to outdated statistical software such as SAS and clinging to frequentism.
  7. Some people don’t want to change because they feel they are in effect criticizing their former selves. I just think “Onward and upward!”.
1 Like

So, would science become better if more people converted to Bayesian statistics? Just like frequentists who proposed a lower P-value threshold?

And to those who remain reluctant to convert to the Bayes camp, how can they use P-value and confidence intervals correctly? It is hard to think of a way now since they have such an counter-intuitive meaning.

It is extremely difficult to use confidence intervals (better called compatibility intervals) and p-values correctly in my opinion. Some attempts are here. I firmly believe that science would be far better by attempting to answer the right question instead of more easily answering the wrong question. On that point see my take on Tukey’s classic statement here.

Then, would it be fair to say that the ease of getting p-values and CIs enables their mindless use in scientific research?

That’s a contributing factor.

Hi there!
Late arrival here, trying to catch up with the course - which is great, by the way, thank you!
I found the 4th session on statistical inference fascinating but also challenging, despite the fact that this is not the first time I am trying to wrap my head around frequentist vs Bayesian. I have a couple questions regarding this lecture:

  1. Concerning your often repeated “bon mot” about conclusions one can draw from a p-value over the significance threshold (“The money was spent”): I agree that the p-value alone says very little in that case. However if a study had a correct sample size estimation, made reasonable assumptions about distributions, chose appropriate tests and then arrived at a non-significant p-value, it must be more one can conclude. Surely those previous steps could also be faulty, but that possibility would be just as real had the p-value been significant. What would you suggest authors of such well-conducted studies do when faced with a non-significant p-value?
  2. I have listened several times but still can’t really wrap my head around why a type I error or α is not really an error - the chance of erroneously rejecting a true null-hypothesis seems to me like probability of making a mistake. Could you maybe provide a reference where this is elaborated on in more detail?
  3. Despite the fact that I have spent much time already reading up on these issues, I still find it difficult to convey the gist of it to my wife who is a full-time, busy researcher with the classical graduate-school education in basic frequentist statistics. Could you maybe point me to a reference which one could use as a “Come to the dark side!” introduction? :slight_smile:
    Thank you very much!
    Best regards,
    Aron Kerenyi, MD, PhD

The p-value is a post-study quantity but the other things you listed are pre-study concepts that do not directly apply once the study is over. So it is not possible to conclude anything other than “with the current sample size the data were unable to overcome the supposition of no effect”.

Think of what you do when a test statistic exceeds an arbitrary critical value or the p-value is less than an arbitrary cutoff. Your interpretation is fully conditional on there truly being no effect. So type I error probability, while possible to think of as an “alarm probability”, cannot tell you anything about the error you are interested in: a treatment not working when you interpret the evidence as saying it is working. You can’t assume X is true and then do a calculation about the probability that X is true. See the smoke alarm analogy in Statistical Thinking - My Journey from Frequentist to Bayesian Statistics.

Look at Introduction to Bayes for Evaluating Treatments and Richard McElreath’s Statistical Rethinking book.

Thanks for the terrific questions and please follow-up with more questions so I can try to make this clearer.

Thank you for your detailed answer, Frank! It has been enlightening to read your journey towards Bayes, and with many great references for further reading!

Regarding the non-significant p problem, I think I understand what you mean, ie having collected the data only affects p, all the other aspects I have mentioned have been decided pre-study. I still think that when faced with a large p-value, it might be worth re-iterating these pre-study decision, ie:
If the true effect size was larger then the threshold, pre-defined in the power-calculation, having repeated the experiment 20 times, 4 would yield similarly non-significant p-values (with a β of 0.8). However if the true effect size was smaller then the pre-defined threshold, 19 out of 20 experiments would yield similarly non-significant results.

Another idea would be to have a rescue, Bayesian backup-plan for non-significant p-values. I guess the problem is that you can’t really define reasonable priors having already looked at your data. But if one made such a “contingency plan” beforehand, could this be a way to re-interpret the date with more meaning? Or maybe just use flat priors? (Of course if one buys into Bayesian thinking than one would probably not do a frequentist analysis to begin with. But let’s assume a very frequentist journal editor for the sake of the argument)