Question about determining statistical significance

marymlucas · June 12, 2020, 5:25pm

Hi all, I have what might seem a somewhat rudimentary/naive question. I don’t have a background in statistics so I want to make sure I report my analysis correctly.

I have a dataset with number of calls to 911 for 5 consecutive yrs for different complaints. As part of my analysis I am comparing month to month so I have aggregated the calls by month and in some cases by complaint category too. I have plotted the results to visualize trends. When I compare these numbers how do I determine if a particular difference in number of calls is statistically significant? Say number of calls in August 2019 vs August 2020 etc? I know p-values are often used as a measure of statistical significance but my thought process has jammed on using them here.

Thank you
Mary

f2harrell · June 12, 2020, 6:19pm

The American Statistical Association has formally pushed to ban “statistical significance”, an effort I applaud. It has come to mean very little, and represents an arbitrary designation. Please re-state your ultimate goals in a way that does not use arbitrary dichotomization of evidence but rather asks questions.

marymlucas · June 12, 2020, 6:37pm

Thank you for this reply. I was initially leaning towards just presenting the numbers and percent differences and explaining what I think the significance of the difference was, but a colleague suggested that this would be misleading if I didn’t mention statistical significance of the differences.

My thinking was it would suffice to say “there was an x percent increase in calls for cardiac arrest in March of 2019 when compared to March of the previous year and an x percent change when compared to the average of the previous number of calls in the previous x years. This change may be attributed to … “

R_cubed · June 12, 2020, 6:47pm

Your problem would be addressed by well known methods for time series. Time series data are a bit more complex to deal with because of the correlation of data points with time.

You have a rather long journey ahead if you want to understand how fundamental statistical concepts apply to time series data.

The approaches in the textbooks would attempt to model the trend component and cyclical component by comparing year over year or quarter over quarter changes. Since you have 5 years of data, you only have 5 data points for any particular month, which isn’t a whole lot of information to estimate any cyclical effects.

You could attempt to estimate trend using exponential moving average techniques. ARIMA – (Autoregressive Integrated Moving Average) methods are the basic tools for this now. Models that avoid the assumption of a normal distribution of residuals are know as nonparametric, and would certainly be worth considering.

More complicated attempts to estimate trend will involve fitting of functions to the data.

I think you are on the right track of simply sticking to descriptions of the data, and comparing them to historical norms (ie. percentiles). Maybe someone has even better ideas here.

Addendum Here is a link to an open source college level text on forecasting methods. This should be helpful.

marymlucas · June 12, 2020, 7:02pm

Thank you. I think you’re right that I should stick for now with simply describing the data and showing that there’s a difference. At this stage I’m not really looking to do any modeling just to show that we saw drops and rises in certain complaints and what that could mean. I will most definitely look at the time series link and learn more about the approach you mentioned.

Any other suggestions are appreciated. Glad to be here and looking forward to learning from and with everyone

f2harrell · June 12, 2020, 8:21pm

Add compatibility intervals AKA confidence intervals. They are useful if you don’t try to interpret them

datamongerbonny · June 19, 2020, 12:47pm

That would be a great name for a rock band-- Arbitrary Dichotomization of Evidence