A glossary of statistical terms, intended primarily for nonstatisticians, may be found here. The purpose of this topic is to provide a place for suggested improvements to any of the existing definitions, or to add new definitions for terms not already covered in the glossary. Feel free to propose your own definitions. You will be credited for material that is added to the glossary.
Thank you Frank. Very helpful for review and revision purposes.
Thanks Frank, this is great and helps me (and possibly my students) a lot.
Pls allow a comment for the item â€śrateâ€ť which you write is unconstrained. As far as I see it, a rate is always nonnegative.
As an aside, I appreciate thay you see a rate as something â€śunit per timeâ€ť and that it shouldnâ€™t be confused with probabilities. In the definition of type I and type II errors, you use â€śfalse positive rateâ€ť and â€śfalse negative rateâ€ť as synonyms. I guess in this sense, you interpret type I and type II errors â€śratesâ€ť as probs (sry for nitpicking).
Thanks and best wishes,
Ralph
Great input. Iâ€™ll correct the phrase about the constraint and change rates to probabilities in type I and type II errors. Iâ€™m glad you caught that because Iâ€™m always fussing about type I error rates being the wrong term when I want to be nitpicky.
@RBrinks agree to all your points, except rates and â€śunit per timeâ€ť. In my opinion the denominator does not necessarily have to be time. I consider e.g. â€śfalls per distance walkedâ€ť or â€śbacteria per surface areaâ€ť as rates too. Guess the point is, that the numerator is not a sub set of the denominator.
I think youâ€™re right. Does anyone else think that a rate needs to be a derivative? Or do others agree that falls per distance walked is truly a rate? Here are the Oxford Dictionary definitions for rate as a noun. Seems pretty inclusive.
I updated the glossary with @Peter_R_S suggestions.
Iâ€™d like a definition for randomized controlled trials (RCT).
I was considering some definition like https://emj.bmj.com/content/20/2/164:
A randomized controlled trial is a type of experiment where we control for the effect of a confounding variable by randomly assigning subjects into groups which will or will not receive a treatment. It is the goldstandard for establishing causality.
Not bad but I hesitate to keep using the word control to mean standard treatment or no treatment, or in the BMJ definition the use of â€świll not receive a treatmentâ€ť. An activecontrol trial or just a headtohead comparison of two modern treatments is a perfectly good RCT. Also there are randomized crossover studies. Iâ€™ve never been clear on whether we should imply only a parallelgroup design when using the term RCT.
I feel there are good points in both sources. It should be clear that an RCT avoids bias, due to known and unknown confounders at the time of randomization.
The former is mentioned on Wikipedia, the latter in the BMJ paper.
RCTs are not restricted to two groups either
I added a somewhat comprehensive definition under clinical trials. Thanks both of you for input.
Iâ€™m adding Julia Rohrerâ€™s definitions of reproducibiity etc.
I guess the important question is the intended purpose of the glossary. The definitions are often what is definitions rather than whatâ€™s the purpose of definitions.
For example, interquartile range is simply defined as the range between the outer quartiles. Fair enough, but weâ€™re not told itâ€™s a measure of data spread.
And observational studies are defined as Study in which no experimental condition (e.g., treatment) is manipulated by the investigator, i.e., randomization is not used which defines them by the absence of a feature. It would help to mention that they are frequently used to measure the characteristics of a study population and to look for associations between these characteristics. And maybe that they are frequently used in health research.
The level of detail, both in terms of formulas and examples, varies widely â€“ paired data, for example, is quite comprehensive, while parametric model simply says A model based on a mathematical function having a few unknown parameters.
My advice would be to define a readership and to do a thinkaloud protocol. I am constantly surprised by the explanations that practising doctors do and do not understand. Markov models, for example, are easy, while degrees of freedom (found in every paper in a journal club) is one that I still havenâ€™t found an easy explanation for.
I must welcome the project, though, and wish it well. When two peoples speak different languages, the person who makes a good dictionary deserves to be honoured.
As a footnote : I have a very ancient copy of Johnsonâ€™s dictionary, and it does, indeed, define a lexicographer as A writer of dictionaries; a harmless drudge.
I am still chuckling over the definition of â€śdata scienceâ€ť. These definitions should be distributed Week 1 to any biostatistics course.
Ronan I think these are great observations. I would like to not have to think so much, by avoiding the issue of the particular audience (other than emphasizing clinicians). So my inclination is to just try to make each definition better. I would like to incorporate the fantastic suggestions you listed here. Further contributions welcomed!
Hi Frank! It would be very helpful to define the term â€śoverfittingâ€ť in your glossary, in the context of regression modeling.
Excellent suggestion. Just added it. See if the definition does the trick.
This is really excellent. I think an entry for causal inference could be useful.
Iâ€™ll need a volunteer for that one. Iâ€™m unqualified and would be too tempted to write â€śsomething that comes strictly from the study design and not the data, and for which further discussion is not very fruitfulâ€ť.
Very useful!

Suggest adding a â€śnumber needed to treatâ€ť definition.

There is a typo in the AI definition â€śAI is n procedureâ€ť.

Might be worth adding general and generalized linear model terms separately with a â€śsee linear regressionâ€ť