Hello all,
Motivation: Big, hypothesis-free data analysis studies using e.g. the UK Biobank, or other equivalent resources (often large epidemiological studies) make the news, because of their unlikely, sometimes implausible, mostly unreproducible, findings. See here for just one recent example. This has become such a famous bete noir that the Daily Mail Oncological Ontology Project (now retired I think), made a wonderful satire of the practice. My suspicion is that alot of these kinds of findings are the result of residual confounding of SEG.
I’m wondering if anyone can point me to useful materials on the actual statistical mechanics of this kind of daftness. “Findings” like these have significant impact on my interactions with patients (I’m a dementia specialist and researcher and also now suffering a severe case of late-onset mathematics leading to mid-life undergraduate study - blame FH!). To my mind this kind of thing undermines the legitimacy of science in the public discourse/imagination.
There are obvious potential baddies:
- Failure to correct for multiple testing
- Researcher degrees of freedom unacknowledged
- The publish or perish culture, combined with University comms departments and the media creating a hot mess of poor science and poor science communication .
But, I’m a bit more interested in whether, for example:
- A non-linear relationship between e.g. SEG and variable X can cause residual confounding when both SEG and X are included in a simple multivariable model with linear assumptions.
- Would a hierarchical/mixed effects model reduce the risk of such confounding?
- Because wealth is Pareto distributed does the categorisation into SEG classes inevitably lead to residual confounding - i.e. can it be expected to, and could data access be made contingent on a commitment to avoiding this kind of error?
- What are other statistical sources of this kind of error?
- Can we effectively protect against it by using the methods of causal inference, e.g. clear DAG development, or simply pre-registration?
- More broadly, do people think the drive to bad science/comms can in fact be overcome by “mere” methodological guardrails?
I’m especially interested in whether this has already been studied, how and by whom. I can find very little on it outside the grey/popular literature. With sincerest thanks.