Medscape psychiatry has a good write-up of a very interesting paper by Silberzahn and colleagues in which they got 29 teams, involving 61 analysts, who used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. Analytic approaches varied widely across the teams, and the estimated effect sizes ranged from 0.89 to 2.93 (Mdn = 1.31) in odds-ratio units.
They report that the 29 different analyses used 21 unique combinations of covariates. Neither analysts’ prior beliefs about the effect of interest nor their level of expertise readily explained the variation in the outcomes of the analyses. Peer ratings of the quality of the analyses also did not account for the variability. These findings suggest that significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions.
Their methodology is particularly interesting and sophisticated, and leaves one in considerable doubt. As the Medscape reviewer says : For years, when I read a scientific paper, I’ve thought that the data yield the published result. What Nosek and his colleagues have found is that results can be highly contingent on the way the researchers analyze the data. And, get this: There is little agreement on the best way to analyze data.
Recommended weekend reading.