I’m struggling with a theoretical question in my undergraduate class within null hypothesis significance testing (NHST), and I imagine this forum could give me some advices and recommendations related to the this question.

Assuming that:

-Inferential statistics aim to estimate a parameter from a sample. If the sample has adequate properties, I can use all the inferential machinery of statistics.

-When we test hypotheses of differences between means, using t tests for example, the p value is a widely used measure to reject or fail to reject the null hypothesis.

- Traditional frequentist statistics assume that null hypothesis is true.

-The formal definition of p values is the probability of observing a test statistic (in this case, the t value) with a value as extreme or even more extreme conditional on the null hypothesis being true.

Now I would like to know if you partially or fully agree with this sentence below. If not, how would you change it to become consistent

"Let’s say you want to know how many hours male and female students sleep at night. There are two different ways to conduct this research. The first is a census study, in which all students will be present. The second way is to conduct this study with a random sample of students and then generalize this result to the entire population. For several reasons, census research is not an option and we need to move to the second way. After carrying out the study, let’s say that we discovered that: On average, males sleep for 8 hours and females for 7 hours. The t t test was significant, with a p value of 0.02 (alpha was defined at 0.05). What can we conclude?

(1) Under the null hypothesis, we were expecting similar results;

(2) With this p value, there is strong evidence against the null hypothesis!

(3) Male students sleep more than female students in the population!

(4) I’m aware that the parameter is fix and my interval varies. Therefore, I’d expect find this value in 95% of studies over the long run.

(5) statistics provided a shortcut. It was not necessary to study the entire population to discover some of its results.

(6) In this scenario, we are able to do a “reverse engineering” and get all students to check this parameter. There are multiple scenarios with infinite population, in which these procedures are even more valuable!"

Null hypothesis significance testing is widely used in applied statistics, and the correct interpretation of the results is the outcome we all want as professors. References are very welcome!

This is theoretical question. If not suitable for this forum, please let me know.

Thank you.