Principles and guidelines for applied statistics

pbstark · September 11, 2018, 6:43pm

At Sander’s suggestion, I’m posting my list of guidelines for young applied statisticians.

Cheers,
Philip

Consider the underlying science. The interesting scientific questions are not always questions statistics can answer.
Think about where the data come from and how they happened to become your sample.
Think before you calculate. Will the answer mean anything? What?
The data, the formula, and the algorithm all can be right, and the answer still can be wrong: Assumptions matter.
Enumerate the assumptions. Check those you can; flag those you can’t. Which are plausible? Which are plainly false? How much might it matter?
Why is what you did the right thing to have done?
A statistician’s most powerful tool is randomness—real, not supposed.
Beware hypothesis tests and confidence intervals in situations with no real randomness.
Errors never have a normal distribution. The consequence of pretending that they do depends on the situation, the science, and the goal.
Worry about systematic error. Systematically.
There’s always a bug, even after you find the last bug.
Association is not necessarily causation, even if it’s Really Strong association.
Significance is not importance. Insignificance is not unimportance. (Here’s a lay explanation of p -values.)
Life is full of Type III errors.
Order of operations: Get it right. Then get it published.
The most important work is often not the hardest nor the most interesting technically, but it requires the most patience: a technical tour-de-force is typically less useful than curiosity, skeptical persistence, and shoe leather.