Reference Collection to push back against "Common Statistical Myths"

pakeezahs · July 17, 2019, 1:55pm

Hello,

Not sure if I can edit the entry directly but I found this to be helpful:

Gary King on Why Propensity Scores Should Not Be Used for Matching

ADAlthousePhD · July 17, 2019, 2:07pm

@f2harrell can comment but there may be a restriction that prevents one from editing unless you have contributed to the forum before, or posted a certain number of times (a bot/quality control issue, I think).

I’ll add this to the wiki, though. Thanks!

Matt_Williams · August 22, 2019, 11:42pm

Fantastic thread. I can’t edit at the moment because I’m a new user, but under TOPIC: Misunderstood “Normality” Assumptions this paper might be relevant:

Williams, M. N., Grajales, C. A. G., & Kurkiewicz, D. (2013). Assumptions of multiple regression: Correcting two misconceptions. Practical Assessment, Research & Evaluation, 18(11). http://www.pareonline.net/getvn.asp?v=18&n=11

(Excuse the self-promotion!)

RonanConroy · August 24, 2019, 5:52pm

What an incredibly useful post. There must be a ‘bravo’ emoji, but I am for sure far too old to know where to find it.

ADAlthousePhD · August 26, 2019, 12:42pm

Thanks Ronan - please feel free to add your own suggestions or references!

SameeraDaniels · August 27, 2019, 3:25pm

This thread should be highlighted on Twitter and other social media platforms. Perhaps on Facebook Psychological Methods Discussion Page.

natea · September 2, 2019, 9:52pm

Since I haven’t posted before, I don’t think I’m able to directly edit the wiki, but I wanted to provide a nice pair of references that might be valuable additions to the propensity score matching section resources!

Brooks JM, Ohsfeldt RL. Squeezing the balloon: propensity scores and unmeasured covariate balance. Health services research. 2013 Aug;48(4):1487-507.

ncbi.nlm.nih.gov

Squeezing the balloon: propensity scores and unmeasured covariate balance.

JM Brooks and RL Ohsfeldt, Health services research, Aug 2013

To assess the covariate balancing properties of propensity score-based algorithms in which covariates affecting treatment choice are both measured and unmeasured.A simulation model of treatment choice and outcome.Simulation.Eight simulation scenarios varied with the values placed on measured and unmeasured covariates and the strength of the relationships between the measured and unmeasured covariates. The balance of both measured and unmeasured covariates was compared across patients either grouped or reweighted by propensity scores methods.Propensity score algorithms require unmeasured covariate variation that is unrelated to measured covariates, and they exacerbate the imbalance in this variation between treated and untreated patients relative to the full unweighted sample.The balance of measured covariates between treated and untreated patients has opposite implications for unmeasured covariates in randomized and observational studies. Measured covariate balance between treated and untreated patients in randomized studies reinforces the notion that all covariates are balanced. In contrast, forced balance of measured covariates using propensity score methods in observational studies exacerbates the imbalance in the independent portion of the variation in the unmeasured covariates, which can be likened to squeezing a balloon. If the unmeasured covariates affecting treatment choice are confounders, propensity score methods can exacerbate the bias in treatment effect estimates.

and:

Ali MS, Groenwold RH, Klungel OH. Propensity score methods and unobserved covariate imbalance: comments on “squeezing the balloon”. Health services research. 2014 Jun;49(3):1074-82.
https://onlinelibrary.wiley.com/doi/abs/10.1111/1475-6773.12152

baxpr · September 12, 2019, 7:21pm

I stumbled on this article about issues with categorization (responder analysis): https://trialsjournal.biomedcentral.com/articles/10.1186/1745-6215-8-31

Ideally, a clinical trial should be able to demonstrate not only a statistically significant improvement in the primary efficacy endpoint, but also that the magnitude of the effect is clinically relevant. One proposed approach to address this question is a responder analysis, in which a continuous primary efficacy measure is dichotomized into “responders” and “non-responders.” In this paper we discuss various weaknesses with this approach, including a potentially large cost in statistical efficiency, as well as its failure to achieve its main goal. We propose an approach in which the assessments of statistical significance and clinical relevance are separated.

lbautista · September 13, 2019, 3:29pm

I agree with Dr. Harrel. But I think an argument for adjustment could be made, in addition to the known gain in precision in the estimate of effect. Randomizing treatments is not a full proof method. Even if you randomize a million patients, there is no guarantee the potential outcomes will be the same in treated and “untreated”. It would be very unlikely if the potential outcomes are not very similar, but unlikely/rare things do happen. They are bound to happen due to the very nature of randomization. If we put aside issues of variable selection and how variables will be modelled, adjusting will provide evidence about the exchangeability of the treatment groups beyond the evidence provided in the traditional Table 1 comparing prognostic factors in treated and untreated. Even if each prognostic factor in Table 1 is balanced, this does no imply combinations of multiple prognostic factors are also balanced. In other words, prognostic factors and treatment may not be associated in a crude analysis (presented in Table 1), but may be associated in a multivariate analysis (never presented). To avoid conscious or unconscious manipulation of the analysis, we could decide on what variables we would adjust for pre-facto, as part of the study protocol. Actually, what we report in Table 1 is a list of the variables we believe we should adjust for. These variables could be selected using the same substantive-based approaches we use in observational studies. There doesn’t seem to be a methodological reason for adjusted effect estimates from RCT to be more biased than crude estimates (again, assuming modeling assumptions are correct). In most cases, particularly in mid-size and small trials, the validity of the estimate of the effect of the treatment will be enhanced, and credibility of the RCT findings would increase, if crude and adjusted estimates are consistent.

f2harrell · September 14, 2019, 11:40am

This is described in the “Table one” topic where it is shown that even if you don’t bother to measure any covariates the inference is sound (though not efficient). So I can’t say I agree with this angle on the problem.

f2harrell · September 14, 2019, 11:41am

I’ll add that to the separate responder analysis “loser x4” topic. Great paper.

lbautista · September 14, 2019, 1:19pm

I do not argue the non-ajusted estimates are biased. I argue that in “small” and “moderate” size the exchangeability of treatment arms may be compromised and that small differences in several prognostic factors could lead to significant bias in the estimate of effect. This can not be appreciated in univariate comparisons of the distribution of prognostic factors across treatment groups, which is what is presented in Table 1. Therefore, if I see small differences in several prognostic factors or if I see a large difference in a single prognostic factor, I would present crude and adjusted estimates, and would give more weight to the adjusted one, for the purpose of inferences, if they are different. I also argue that even in the case of “large” trials, adjusting would not introduce bias. This is a direct consequence of the independence between treatment assigned and potential outcome that results from randomization. Therefore, if adjusted and crude estimates differ in a large trial, I’d be inclined to believe something was wrong with the model used for the adjustment. Briefly, there is nothing wrong with adjusting for prognostic factors in a RCT, either from the perspective of precision or bias, unless the model used for the adjustment is misspecified.

PerPersvensson · November 25, 2019, 4:24pm

Great post
I wonder if the first topic could be broadened to also apply to observational “table one’s” such as descriptives of baseline data in different exposure groups in a cohort study ? The STROBE criteria argue against significance testing.

Would be interesting to hear your thought also on observational studies
Thanks

tho_ols · November 28, 2019, 11:18am

Added topic on significance testing in pilot studies with some useful references, feel free to expand.

SteveSchwartz · March 26, 2020, 8:41pm

I feel that this issue of not calculating and presenting p-values in Table 1 extends to observational studies, for multiple reasons. That said, I am not aware of any published papers that have made this argument.

mgrafit · May 28, 2020, 9:59am

Thanks for initiating this list. I discovered 3 articles I didn’t know about before that.
Please see below some news references, by topic.

TOPIC: Analyzing “Change” Measures in RCT’s

Archie J.P. Mathematic coupling of data – A common source of error. Annals of Surgery, 1980, 193: 296-303
Yanez N.D. et al. The effects of measurement error in response variables and tests of association of explanatory variables in change models. SiM, 1998, 17: 2597-2606.
Senn S. Change from baseline and analysis of covariance revisited. SiM, 2006, 25: 4334-4344.
Tu Y-K., et al. Revisiting the relation between change and initial value: A review and evaluation. SiM, 2007. https://doi.org/10.1002/sim.2538
Braun J., et al. Accounting for baseline differences and measurement error in the analysis of change over time. SiM, 2013. https://doi.org/10.1002/sim.5910
Tu Y-K. Testing the relation between percentage change and baseline value. ScientificReports, 2016. https://doi.org/10.1038/srep23247
Clifton et al. Comparing different ways of calculating sample size for two independent means: A worked example. CCT, 2019. https://doi.org/10.1016/j.conctc.2018.100309

TOPIC: Stepwise Variable Selection (Don’t Do It!)

Heinze G. et al. Variable selection – A review and recommendations for the practicing statistician. BiomJ, 2017. https://doi.org/10.1002/bimj.201700067
Ahamadi M. et al. Operating characteristics of stepwise covariate selection in pharmacometric modeling. JPKPD, 2019. https://doi.org/10.1007/s10928-019-09635-6

TOPIC: Inappropriately Splitting Continuous Variables Into Categorical Ones

Weinberg C.R. How bad is categorization? Epidemiology, 1995, 6:345-346.
Senn S. Disappointing dichotomies. PharmStat, 2003. https://doi.org/10.1002/pst.090
Chen H. et al. Biased odds ratios from dichotomization of age. SiM, 2007. https://doi.org/10.1002/sim.2737
VanWalraven C. et al. Leave ‘em alone – why continuous variables should be analyzed as such. Neuroepidemiology 2008, https://doi.org/10.1159/000126908

Hope it will be useful to you all.

verbeekmc · December 3, 2020, 9:43am

Related to power and the measurement of change scores vs. group differences, does anyone know references about the number of measurements (for instance, adding an ‘in-between measurement’) to increase power? Does it matter if you are only going to investigate group differences or is just a pre-measurement enough?

f2harrell · December 3, 2020, 1:17pm

Since this does not fit in with the ‘myths’ topic please start a new topic with appropriate primary and secondary topic and tag choices. Then I’ll remove this one.

EpiLearneR · December 14, 2020, 2:45am

TOPICS suggestion: It would be useful to have a reference collection on P value and confidence interval myths

R_cubed · December 14, 2020, 2:47pm

@EpiLearneR You might find this open access paper valuable:

Search for @Sander (Sander Greenland) here, and you will find a lot of excellent papers on these misinterpretations as well as corrective measures. If you read them slowly and are prepared to look up the mathematics you do not know, they will teach you a lot.

Here is a good link to some of the things he has written on this: