BBR Session 4: Hypothesis Testing, Branches of Statistics, P-values

This is a topic for questions, answers, and discussions about session 4 of the Biostatistics for Biomedical Research web course airing on 2019-11-01. Session topics are listed here.

1 Like

On page 5-5 (section 5.1.1) of the BRR notes one bullet point says:

if the data distribution is asymmetric, the SD is not independent of the mean (so the t distribution does not hold) and the SD is not a good dispersion measure

As far as I’m aware, the normal distribution is the only distribution where the mean is independent of the variance (see Lukacs (1942)). This fact even characterizes the normal distribution.

In the light of this I have a question:

  • What’s the relationship between symmetry, independence of mean and SD/variance and the adequateness of the SD as a dispersion measure?

Thank you.

Note: you’ll need to change your profile to provide your real first and last names in order to stay on the site.

Good question, and thanks for the point about the characterization of the normal distribution. I’ll answer the easy part. I don’t think the SD is a good dispersion measure for asymmetric distributions in general. I’d use Gini’s mean difference and quartiles.

Thanks for the answer. I agree that Gini’s mean difference is very appealing because it has a very intuitive interpretation.

Through the usual links, I can see a “Session 4”, but this only gives a download file option-- is there no longer replay through the private YouTube link?

Sorry about that. A couple of days ago your name didn’t pop up on my tablet; now I see it. I’ll edit these comments out shortly.

1 Like

I really wish I could have had this brief video on basic statistical inference years ago. I’m quite thankful for you sharing your expertise so liberally.

Edit: I will try to rephrase my question. If you still deem it off topic, I will delete and start another thread on it if you wish me to do so.

From the 19:00 - 20:40 in the video, you discuss the problems with 2-stage testing (ie. normality test prior to t-test), as an ineffective way to deal with departures from the assumption of normality. You point out that non-parametric tests are in most common scenarios more powerful than the naive, textbook versions of the t-test, and when they are not, this loss of efficiency is minor. This point is often missed in the literature I’ve come across.

You then state “If you really thought the data might come from a non-normal distribution, you should use the non-parametric test straight away.”

While I understand the logic behind the recommendation (I came to a similar conclusion after much study), other statisticians I’ve communicated with would disagree.

They would point out that while the naive, textbook versions of the t-test might not be best, robust variants (trimmed, Windsorized, weighted means, etc.) retain the advantages of the classical, parametric approach.

You later point out that the Bayesian version of the t-test is inherently robust, if one insists on a parametric point of view.

Ignoring Likelihood methods, this leads to 3 possible alternatives for the data analyst:

  1. Frequentist nonparametric methods
  2. Frequentist robust parametric methods
  3. Bayesian parametric methods

Are there techniques from Frequentist robust methods that you find worthwhile? Or do you essentially perform all of your parametric analyses within the Bayesian framework, and then use nonparametrics when you need to use the classical frequentist approach?

Your insight is always appreciated.

That seems to be a different topic. See if you want to open a new one.

Same issue here - I cannot open the .mkv file
thanks

VLC Media player is a free program that played it for me.

1 Like

Figured that out, but now to find someone with admin rights to my machine to download it
thanks!

After viewing the lecture video, I get the impression that the frequentist statistics do not answer the question that we want to ask or perhaps doing it in a roundabout way (through disproving the null rather than estimate the probability that the alternative is true). I also find the correct definition of p-values and confidence interval rather unintuitive (“in the long run”) as the data that we have are usually the only one that we have.

I can imagine that there must be other reasons out there that sustained the frequentist vs Bayesian debate, which made people still stayed in the frequentist camp. Could someone enlighten me on this?

You’ll see very long discussions about that on stats.stackexchange.com. My personal opinion about the reasons for people staying frequentist are:

  1. Stat courses teach mainly what the instructor thinks the students will need in the future, and most instructors were taught by the previous generation of instructors who felt the same way.
  2. Fisher had a huge PR machine, traveled all over the world, and had lots of students to spread the word.
  3. Bayes died before any of his work was known, and he had no students.
  4. Frequentists feel that their methods are more objective. Bayesians are certain that frequentist methods are more subjective.
  5. When frequentistism was on the rise, the computational tools needed to do Bayesian statistics were not available. Some frequentist methods can be done without a computer.
  6. Many statisticians are afraid of change just like other people. I feel there is a correlation between clinging to outdated statistical software such as SAS and clinging to frequentism.
  7. Some people don’t want to change because they feel they are in effect criticizing their former selves. I just think “Onward and upward!”.

So, would science become better if more people converted to Bayesian statistics? Just like frequentists who proposed a lower P-value threshold?

And to those who remain reluctant to convert to the Bayes camp, how can they use P-value and confidence intervals correctly? It is hard to think of a way now since they have such an counter-intuitive meaning.

It is extremely difficult to use confidence intervals (better called compatibility intervals) and p-values correctly in my opinion. Some attempts are here. I firmly believe that science would be far better by attempting to answer the right question instead of more easily answering the wrong question. On that point see my take on Tukey’s classic statement here.

Then, would it be fair to say that the ease of getting p-values and CIs enables their mindless use in scientific research?

That’s a contributing factor.