I feel that this issue of not calculating and presenting pvalues in Table 1 extends to observational studies, for multiple reasons. That said, I am not aware of any published papers that have made this argument.
Thanks for initiating this list. I discovered 3 articles I didn’t know about before that.
Please see below some news references, by topic.
TOPIC: Analyzing “Change” Measures in RCT’s
 Archie J.P. Mathematic coupling of data – A common source of error. Annals of Surgery, 1980, 193: 296303
 Yanez N.D. et al. The effects of measurement error in response variables and tests of association of explanatory variables in change models. SiM, 1998, 17: 25972606.
 Senn S. Change from baseline and analysis of covariance revisited. SiM, 2006, 25: 43344344.
 Tu YK., et al. Revisiting the relation between change and initial value: A review and evaluation. SiM, 2007. https://doi.org/10.1002/sim.2538
 Braun J., et al. Accounting for baseline differences and measurement error in the analysis of change over time. SiM, 2013. https://doi.org/10.1002/sim.5910
 Tu YK. Testing the relation between percentage change and baseline value. ScientificReports, 2016. https://doi.org/10.1038/srep23247
 Clifton et al. Comparing different ways of calculating sample size for two independent means: A worked example. CCT, 2019. https://doi.org/10.1016/j.conctc.2018.100309
TOPIC: Stepwise Variable Selection (Don’t Do It!)
 Heinze G. et al. Variable selection – A review and recommendations for the practicing statistician. BiomJ, 2017. https://doi.org/10.1002/bimj.201700067
 Ahamadi M. et al. Operating characteristics of stepwise covariate selection in pharmacometric modeling. JPKPD, 2019. https://doi.org/10.1007/s10928019096356
TOPIC: Inappropriately Splitting Continuous Variables Into Categorical Ones
 Weinberg C.R. How bad is categorization? Epidemiology, 1995, 6:345346.
 Senn S. Disappointing dichotomies. PharmStat, 2003. https://doi.org/10.1002/pst.090
 Chen H. et al. Biased odds ratios from dichotomization of age. SiM, 2007. https://doi.org/10.1002/sim.2737
 VanWalraven C. et al. Leave ‘em alone – why continuous variables should be analyzed as such. Neuroepidemiology 2008, https://doi.org/10.1159/000126908
Hope it will be useful to you all.
Related to power and the measurement of change scores vs. group differences, does anyone know references about the number of measurements (for instance, adding an ‘inbetween measurement’) to increase power? Does it matter if you are only going to investigate group differences or is just a premeasurement enough?
Since this does not fit in with the ‘myths’ topic please start a new topic with appropriate primary and secondary topic and tag choices. Then I’ll remove this one.
TOPICS suggestion: It would be useful to have a reference collection on P value and confidence interval myths
@EpiLearneR You might find this open access paper valuable:
Search for @Sander (Sander Greenland) here, and you will find a lot of excellent papers on these misinterpretations as well as corrective measures. If you read them slowly and are prepared to look up the mathematics you do not know, they will teach you a lot.
Here is a good link to some of the things he has written on this:
This has been added to the sticky at the top now, thanks.
I added another good paper on the pvalue misinterpretations by Leonhard Held:
Held, L (2013) ReverseBayes analysis of two common misinterpretations of significance tests
Clinical Trials Volume: 10 issue: 2,
Later on I’ll add some of the position papers issued by the American Statistical Association on this issue.
I would add this blog post to the list:
Found a great resource on the fallacies on using parametric methods on ordinal data. Which section of this wiki should it be added?
Thanks, this will be very helpful. I think Grieve’s paper should be on the list of references on NNT.
How about something on the importance of prespecifying primary vs secondary endpoints and handling multiplicity, including hierarchical procedures? However, this may not fit that well under the heading of statistical misconceptions. Rather, (seems to me that) people often don’t appreciate how easily outcome switching/Phacking/HARKing can occur.
This one is more controversial than it appears. I follow the Cook & Farewell approach of requiring a prespecified priority ordering of endpoints, but not doing formal multiplicity correction after that.
The following threads had a discussion of this issue. It seems like it could fall under the p value misconception heading, but I think it deserves a section on its own.
I think the Bayesian POV provides a framework for guidance on this. @Sander provided references to Bayesian justifications for adjustment.
Some of his own writing on the issue:
I posted a few references in the context of clinical trials in this thread:
The following is as close to a complete theory of MCPs as we are going to get as the perceived need to adjust is closely related to the ratio of \frac{1\beta} {\alpha} an experimenter wishes to make, or the plausibility of the hypothesis under consideration.
I found some additional resources to pushback on a decision to use change from baseline. I thought some of these would be useful and could be added to the list:
Manuscripts:
Vickers AJ, Altman DG. Analysing controlled trials with baseline and follow up measurements. BMJ 2001; 323: 1123.
Vickers AJ. The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. BMC Med Res Methodol. 2001;1:6. doi: 10.1186/1471228816. Epub 2001 Jun 28. PMID: 11459516; PMCID: PMC34605. (The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study  PubMed)
Bland JM, Altman DG. Comparisons against baseline within randomised groups are often used and can be highly misleading. Trials. 2011 Dec 22;12:264. doi: 10.1186/1745621512264. PMID: 22192231; PMCID: PMC3286439. (Comparisons against baseline within randomised groups are often used and can be highly misleading  PubMed)
Bland JM, Altman DG. Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach. Am J Clin Nutr. 2015 Nov;102(5):9914. doi: 10.3945/ajcn.115.119768. Epub 2015 Sep 9. PMID: 26354536. (Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach  PubMed)
An interesting response to the paper above:
Stanhope KL, Havel PJ. Response to “Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach”. Am J Clin Nutr. 2016 Feb;103(2):589. doi: 10.3945/ajcn.115.125989. PMID: 26834111. (Response to "Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach"  PubMed)
Leo Törnqvist, Pentti Vartia & Yrjö O. Vartia (1985) How Should Relative Changes be Measured?, The American Statistician, 39:1, 4346, DOI: 10.1080/00031305.1985.10479385
(https://www.tandfonline.com/doi/abs/10.1080/00031305.1985.10479385)
Kaiser L. Adjusting for baseline: change or percentage change? Stat Med. 1989 Oct;8(10):118390. doi: 10.1002/sim.4780081002. PMID: 2682909. (Adjusting for baseline: change or percentage change?  PubMed)
Blogs/ Online Resources:
Magnusson K. Change over time is not “treatment response”. Rpsychologist blog 2018. https://rpsychologist.com/treatmentresponsesubgroup
Harrell F. Statistical Errors in the Medical Literature: Change from Baseline. Statistical Thinking 2021 (last updated). (Statistical Errors in the Medical Literature  Statistical Thinking)
Interactive Simulation by Frank Harrell:
Harrell F. Transformations, Measuring Change, and Regression to the Mean. BBR Ch 14. (Transformations)
Great Discussion on QOL Analysis and Change from Baseline:
davidcnorrismd. Exemplary QOL analyses that avoid changefrombaseline blunders? Data methods 2019. (Exemplary QOL analyses that avoid changefrombaseline blunders?)
Twitter Post by Stephen Senn on change from baseline:
(https://twitter.com/stephensenn/status/1224362916423573504)
Text Books:
Harrell F, Slaughter J. Change from Baseline in Randomized Studies. Biostatistics for Biomedical Research. Chapter 14.4.1. (https://hbiostat.org/doc/bbr.pdf)
Senn S. Baselines and Covariate Information. Statistical Issues in Drug Development. Chapter 7.
Might I gingerly add to textbooks: Statistical aspects of the design and analysis of clinical trials (revised edn). Brian S. Everitt and Andrew Pickles, Imperial College Press, London, 2004. [Section 5.1] against the use of Change scores in RCTs and with a nice, didactic comparison of methods to incorporate baseline data, referring to some abovecited material :

Senn, S. (1994a). Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design, Statist. Med. 13:

Senn, S. (1994b). Testing for baseline balance in clinical trials, Statist.
Med. 13: 17151726.
I just came across a recently published commentary by Horwitz et al. discussing “Table 1”. [I haven’t added it to the wiki post because it doesn’t quite fit in with that topic title.]
In the commentary, the authors argue that more “biographical variables” (in addition to “biological variables”) should be included in Table 1. For example, they write that race and/or ethnicity are not “strictly biological”. They cite another commentary which suggests other “social and behavioral determinants of health” (e.g., social isolation) that should be recorded/measured.
While the commentary doesn’t appear to give specific recommendations – it seems more like a “call to action” – its title caught my eye and I thought it could be of interest to epidemiologists and others on this forum.
 Horwitz, R.I., Lobitz, G., Mawn, M., Conroy, A.H., Cullen, M.R., Sim, I. and Singer, B.H., 2022. Rethinking Table 1. Journal of Clinical Epidemiology, 142, 242245.
 Adler, N.E. and Stead, W.W., 2015. Patients in context–EHR capture of social and behavioral determinants of health. The New England Journal of Medicine , 372 (8), 698701.
P.S. If the moderators think this would be more appropriate as a separate topic, that is OK with me. I don’t know if it “pushes back against common myths”, but it does argue for a change in practice.
Here is another paper I found as a reference to say that “posthoc power” is a nonsensical concept:
Fraser, R.A. Inappropriate use of statistical power. Bone Marrow Transplant (2023). Inappropriate use of statistical power  Bone Marrow Transplantation
Unfortunately, the article also writes about the difference between frequentist confidence and Bayesian credibility intervals, but to my understanding both, authors and editors get the interpretation wrong (frequentist interpretation is thought of as limiting relative frequency of an event, what is refuted given the “Arguments against frequentism” from Hájek, and the Bayesian interpretation is seen as being related to subjective probabilities assiened to events (not to parameters)).
Using withingroup tests in parallelgroup randomized trials is not recommedend. Does this also apply to crossover randomized trials? E.g., analyzing prepost change in Condition 1 and prepost change in Condition 2.
The following paper discussed Table 1 in observational studies, which talked about the use of pvalues. Hope it helps others interested in this.
Who is in this study, anyway? Guidelines for a useful Table 1
The appropriateness of including a column containing inferential statistics (e.g. pvalues) is a topic of some controversy. Statistical testing of distributions of variables (e.g. between exposed and unexposed) is common and even occasionally required by journals;1,6,9,10 although this is a tempting way to assess confounding, it is not best practice. Statistical significance is often misunderstood: nonsignificance of a pvalue does not indicate that no difference in the distribution of a variable exists, and significance does not mean that the difference is meaningful or that the difference indicates presence of confounding.10–13 As a result, confounder assessment should not be based on pvalues (Figure 2, Point 3; Figure 3, Point 4).1,2 Rather, authors should consider whether the relationship between the exposure and hypothesized confounders is as expected according to the causal theory, and consider whether the magnitude of an observed difference for a potential confounder represents a meaningful difference.1,9,10 Similarly, when considering external validity, statistical tests are not a helpful way to assess meaningful differences between source and target populations.