Reference Collection to push back against "Common Statistical Myths"

ADAlthousePhD · December 14, 2020, 2:59pm

This has been added to the sticky at the top now, thanks.

R_cubed · December 15, 2020, 4:12pm

I added another good paper on the p-value misinterpretations by Leonhard Held:

Held, L (2013) Reverse-Bayes analysis of two common misinterpretations of significance tests
Clinical Trials Volume: 10 issue: 2,

Later on I’ll add some of the position papers issued by the American Statistical Association on this issue.

arthur_albuquerque · March 8, 2021, 9:30pm

I would add this blog post to the list:

R_cubed · May 29, 2021, 8:53pm

Found a great resource on the fallacies on using parametric methods on ordinal data. Which section of this wiki should it be added?

bqmaks · May 30, 2021, 7:57am

Thanks, this will be very helpful. I think Grieve’s paper should be on the list of references on NNT.

eturnermd1 · January 5, 2022, 4:22am

How about something on the importance of prespecifying primary vs secondary endpoints and handling multiplicity, including hierarchical procedures? However, this may not fit that well under the heading of statistical misconceptions. Rather, (seems to me that) people often don’t appreciate how easily outcome switching/P-hacking/HARKing can occur.

f2harrell · January 6, 2022, 7:17pm

This one is more controversial than it appears. I follow the Cook & Farewell approach of requiring a pre-specified priority ordering of endpoints, but not doing formal multiplicity correction after that.

R_cubed · January 6, 2022, 8:07pm

The following threads had a discussion of this issue. It seems like it could fall under the p value misconception heading, but I think it deserves a section on its own.

I think the Bayesian POV provides a framework for guidance on this. @Sander provided references to Bayesian justifications for adjustment.

Some of his own writing on the issue:

I posted a few references in the context of clinical trials in this thread:

The following is as close to a complete theory of MCPs as we are going to get as the perceived need to adjust is closely related to the ratio of \frac{1-\beta} {\alpha} an experimenter wishes to make, or the plausibility of the hypothesis under consideration.

daszlosek · May 20, 2022, 1:53pm

I found some additional resources to push-back on a decision to use change from baseline. I thought some of these would be useful and could be added to the list:

Manuscripts:

Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ 2001; 323: 1123.

Vickers AJ. The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. BMC Med Res Methodol. 2001;1:6. doi: 10.1186/1471-2288-1-6. Epub 2001 Jun 28. PMID: 11459516; PMCID: PMC34605. (The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study - PubMed)

Bland JM, Altman DG. Comparisons against baseline within randomised groups are often used and can be highly misleading. Trials. 2011 Dec 22;12:264. doi: 10.1186/1745-6215-12-264. PMID: 22192231; PMCID: PMC3286439. (Comparisons against baseline within randomised groups are often used and can be highly misleading - PubMed)

Bland JM, Altman DG. Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach. Am J Clin Nutr. 2015 Nov;102(5):991-4. doi: 10.3945/ajcn.115.119768. Epub 2015 Sep 9. PMID: 26354536. (Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach - PubMed)

An interesting response to the paper above:

Stanhope KL, Havel PJ. Response to “Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach”. Am J Clin Nutr. 2016 Feb;103(2):589. doi: 10.3945/ajcn.115.125989. PMID: 26834111. (Response to "Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach" - PubMed)

Leo Törnqvist, Pentti Vartia & Yrjö O. Vartia (1985) How Should Relative Changes be Measured?, The American Statistician, 39:1, 43-46, DOI: 10.1080/00031305.1985.10479385
(https://www.tandfonline.com/doi/abs/10.1080/00031305.1985.10479385)

Kaiser L. Adjusting for baseline: change or percentage change? Stat Med. 1989 Oct;8(10):1183-90. doi: 10.1002/sim.4780081002. PMID: 2682909. (Adjusting for baseline: change or percentage change? - PubMed)

Blogs/ Online Resources:

Magnusson K. Change over time is not “treatment response”. Rpsychologist blog 2018. https://rpsychologist.com/treatment-response-subgroup

Harrell F. Statistical Errors in the Medical Literature: Change from Baseline. Statistical Thinking 2021 (last updated). (Statistical Errors in the Medical Literature | Statistical Thinking)

Interactive Simulation by Frank Harrell:

Harrell F. Transformations, Measuring Change, and Regression to the Mean. BBR Ch 14. (Transformations)

Great Discussion on QOL Analysis and Change from Baseline:

davidcnorrismd. Exemplary QOL analyses that avoid change-from-baseline blunders? Data methods 2019. (Exemplary QOL analyses that avoid change-from-baseline blunders?)

Twitter Post by Stephen Senn on change from baseline:

(https://twitter.com/stephensenn/status/1224362916423573504)

Text Books:

Harrell F, Slaughter J. Change from Baseline in Randomized Studies. Biostatistics for Biomedical Research. Chapter 14.4.1. (https://hbiostat.org/doc/bbr.pdf)

Senn S. Baselines and Covariate Information. Statistical Issues in Drug Development. Chapter 7.

sadneurons · May 26, 2022, 11:32am

Might I gingerly add to textbooks: Statistical aspects of the design and analysis of clinical trials (revised edn). Brian S. Everitt and Andrew Pickles, Imperial College Press, London, 2004. [Section 5.1] against the use of Change scores in RCTs and with a nice, didactic comparison of methods to incorporate baseline data, referring to some above-cited material :

Senn, S. (1994a). Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design, Statist. Med. 13:
Senn, S. (1994b). Testing for baseline balance in clinical trials, Statist.
Med. 13: 1715-1726.

cwatson · May 28, 2022, 2:24pm

I just came across a recently published commentary by Horwitz et al. discussing “Table 1”. [I haven’t added it to the wiki post because it doesn’t quite fit in with that topic title.]

In the commentary, the authors argue that more “biographical variables” (in addition to “biological variables”) should be included in Table 1. For example, they write that race and/or ethnicity are not “strictly biological”. They cite another commentary which suggests other “social and behavioral determinants of health” (e.g., social isolation) that should be recorded/measured.

While the commentary doesn’t appear to give specific recommendations – it seems more like a “call to action” – its title caught my eye and I thought it could be of interest to epidemiologists and others on this forum.

P.S. If the moderators think this would be more appropriate as a separate topic, that is OK with me. I don’t know if it “pushes back against common myths”, but it does argue for a change in practice.

Jochen · April 27, 2023, 5:27am

Here is another paper I found as a reference to say that “post-hoc power” is a nonsensical concept:

Fraser, R.A. Inappropriate use of statistical power. Bone Marrow Transplant (2023). Inappropriate use of statistical power | Bone Marrow Transplantation

Unfortunately, the article also writes about the difference between frequentist confidence and Bayesian credibility intervals, but to my understanding both, authors and editors get the interpretation wrong (frequentist interpretation is thought of as limiting relative frequency of an event, what is refuted given the “Arguments against frequentism” from Hájek, and the Bayesian interpretation is seen as being related to subjective probabilities assiened to events (not to parameters)).

FurlanLeo · November 30, 2023, 5:42pm

@ADAlthousePhD @f2harrell

Using within-group tests in parallel-group randomized trials is not recommedend. Does this also apply to cross-over randomized trials? E.g., analyzing pre-post change in Condition 1 and pre-post change in Condition 2.

JiaqiLi · December 28, 2023, 1:17am

The following paper discussed Table 1 in observational studies, which talked about the use of p-values. Hope it helps others interested in this.

Who is in this study, anyway? Guidelines for a useful Table 1

The appropriateness of including a column containing inferential statistics (e.g. p-values) is a topic of some controversy. Statistical testing of distributions of variables (e.g. between exposed and unexposed) is common and even occasionally required by journals;1,6,9,10 although this is a tempting way to assess confounding, it is not best practice. Statistical significance is often misunderstood: non-significance of a p-value does not indicate that no difference in the distribution of a variable exists, and significance does not mean that the difference is meaningful or that the difference indicates presence of confounding.10–13 As a result, confounder assessment should not be based on p-values (Figure 2, Point 3; Figure 3, Point 4).1,2 Rather, authors should consider whether the relationship between the exposure and hypothesized confounders is as expected according to the causal theory, and consider whether the magnitude of an observed difference for a potential confounder represents a meaningful difference.1,9,10 Similarly, when considering external validity, statistical tests are not a helpful way to assess meaningful differences between source and target populations.