Reference Collection to push back against "Common Statistical Myths"

Hello,

Not sure if I can edit the entry directly but I found this to be helpful:

Gary King on Why Propensity Scores Should Not Be Used for Matching

@f2harrell can comment but there may be a restriction that prevents one from editing unless you have contributed to the forum before, or posted a certain number of times (a bot/quality control issue, I think).

I’ll add this to the wiki, though. Thanks!

1 Like

Fantastic thread. I can’t edit at the moment because I’m a new user, but under TOPIC: Misunderstood “Normality” Assumptions this paper might be relevant:

Williams, M. N., Grajales, C. A. G., & Kurkiewicz, D. (2013). Assumptions of multiple regression: Correcting two misconceptions. Practical Assessment, Research & Evaluation, 18(11). http://www.pareonline.net/getvn.asp?v=18&n=11

(Excuse the self-promotion!)

2 Likes

What an incredibly useful post. There must be a ‘bravo’ emoji, but I am for sure far too old to know where to find it.

3 Likes

Thanks Ronan - please feel free to add your own suggestions or references!

This thread should be highlighted on Twitter and other social media platforms. Perhaps on Facebook Psychological Methods Discussion Page.

Since I haven’t posted before, I don’t think I’m able to directly edit the wiki, but I wanted to provide a nice pair of references that might be valuable additions to the propensity score matching section resources!

Brooks JM, Ohsfeldt RL. Squeezing the balloon: propensity scores and unmeasured covariate balance. Health services research. 2013 Aug;48(4):1487-507.

and:

Ali MS, Groenwold RH, Klungel OH. Propensity score methods and unobserved covariate imbalance: comments on “squeezing the balloon”. Health services research. 2014 Jun;49(3):1074-82.
https://onlinelibrary.wiley.com/doi/abs/10.1111/1475-6773.12152

3 Likes

I stumbled on this article about issues with categorization (responder analysis): https://trialsjournal.biomedcentral.com/articles/10.1186/1745-6215-8-31

Ideally, a clinical trial should be able to demonstrate not only a statistically significant improvement in the primary efficacy endpoint, but also that the magnitude of the effect is clinically relevant. One proposed approach to address this question is a responder analysis, in which a continuous primary efficacy measure is dichotomized into “responders” and “non-responders.” In this paper we discuss various weaknesses with this approach, including a potentially large cost in statistical efficiency, as well as its failure to achieve its main goal. We propose an approach in which the assessments of statistical significance and clinical relevance are separated.

2 Likes

I agree with Dr. Harrel. But I think an argument for adjustment could be made, in addition to the known gain in precision in the estimate of effect. Randomizing treatments is not a full proof method. Even if you randomize a million patients, there is no guarantee the potential outcomes will be the same in treated and “untreated”. It would be very unlikely if the potential outcomes are not very similar, but unlikely/rare things do happen. They are bound to happen due to the very nature of randomization. If we put aside issues of variable selection and how variables will be modelled, adjusting will provide evidence about the exchangeability of the treatment groups beyond the evidence provided in the traditional Table 1 comparing prognostic factors in treated and untreated. Even if each prognostic factor in Table 1 is balanced, this does no imply combinations of multiple prognostic factors are also balanced. In other words, prognostic factors and treatment may not be associated in a crude analysis (presented in Table 1), but may be associated in a multivariate analysis (never presented). To avoid conscious or unconscious manipulation of the analysis, we could decide on what variables we would adjust for pre-facto, as part of the study protocol. Actually, what we report in Table 1 is a list of the variables we believe we should adjust for. These variables could be selected using the same substantive-based approaches we use in observational studies. There doesn’t seem to be a methodological reason for adjusted effect estimates from RCT to be more biased than crude estimates (again, assuming modeling assumptions are correct). In most cases, particularly in mid-size and small trials, the validity of the estimate of the effect of the treatment will be enhanced, and credibility of the RCT findings would increase, if crude and adjusted estimates are consistent.

1 Like

This is described in the “Table one” topic where it is shown that even if you don’t bother to measure any covariates the inference is sound (though not efficient). So I can’t say I agree with this angle on the problem.

I’ll add that to the separate responder analysis “loser x4” topic. Great paper.

I do not argue the non-ajusted estimates are biased. I argue that in “small” and “moderate” size the exchangeability of treatment arms may be compromised and that small differences in several prognostic factors could lead to significant bias in the estimate of effect. This can not be appreciated in univariate comparisons of the distribution of prognostic factors across treatment groups, which is what is presented in Table 1. Therefore, if I see small differences in several prognostic factors or if I see a large difference in a single prognostic factor, I would present crude and adjusted estimates, and would give more weight to the adjusted one, for the purpose of inferences, if they are different. I also argue that even in the case of “large” trials, adjusting would not introduce bias. This is a direct consequence of the independence between treatment assigned and potential outcome that results from randomization. Therefore, if adjusted and crude estimates differ in a large trial, I’d be inclined to believe something was wrong with the model used for the adjustment. Briefly, there is nothing wrong with adjusting for prognostic factors in a RCT, either from the perspective of precision or bias, unless the model used for the adjustment is misspecified.