Can a dataset be "overdispersed" if I'm not using a model with assumptions on normality?

JMW · July 27, 2019, 7:36pm

Hi!
I have a question regarding choice of measures of central tendency and variability.
I want to report the results from some parasites counts in a population of cats. Most of my cats have a low number of parasites (1-10) with a few extreme outliers. As I understand it this is pretty common in parasitology. I have chosen to use median and range as my measures of central tendency and variability since my data is pretty skewed and mean and standard deviation wouldn’t display that. I just got the paper back from reviewers and they agree with my choice, but would rather that I justify it with “overdispersion” than skewedness. Is this the same thing? I would understand the justification of that if I had used a model that relied on a Poission distribution, but as I understand it the use of mean and standard deviation does not. Is that right? Can a dataset be “overdispersed” if I’m not using a model with assumptions on normality?

R_cubed · July 28, 2019, 1:47am

Is it possible your reviewer was assuming a Poisson model?
Aggregated parasite distributions on hosts in a homogeneous environment: examining the Poisson null model. (link)

Depending on what you want to do with the data, the sample range might not be your best measure of dispersion, as it is defined by the extreme values of the sample. Some other options:

Interquartile Range (IQR) – The 25th, 50th, and 75th percentile are reported.
Mean Absolute Difference (Gini Mean Difference) – You take the mean of the absolute value all pairwise differences. This has attractive properties from a robustness standpoint.

A related measure of central tendency is the Hodges-Lehman estimator. Much like the Mean Absolute Deviation, we pair each observation, but for this metric, we average them. The measure of central tendency is the median of all pairwise averages.

Dr. Harrell (the host of this site) describes these nonparametric techniques in this video:
https://vkc.mc.vanderbilt.edu/news/3710

COOLSerdash · July 28, 2019, 8:16pm

Data by themselves (i.e. in the absence of an assumed model) cannot be overdispersed as far as I’m aware. The Cambridge Dictionary of Statistics (3rd ed) defines overdispersion as (emphasis mine):

The phenomenon that arises when empirical variance in the data exceeds the nominal variance under some assumed model. […]

So overdispersion only makes sense under an assumed model, such as a Poisson model for counts. I never heard overdispersion in discussions about what descriptive measures of central tendency and dispersion to present.