- Reproducible research
- Importance of sound methodology and adequate sample size on reproducibility
- System forces working against reproducibility
- Literate programming and reproducible statistical reports
Here is a quote from the citation.
"21.1 Non-reproducible Research
- Misunderstanding statistics
- Investigator moving the target
- Lack of a blinded analytic plan
- Tweaking instrumentation / removing “outliers”
- Floating definitions of response variables
- Pre-statistician “normalization” of data and background subtraction
- Poorly studied high-dimensional feature selection
- Programming errors
- Lack of documentation
- Failing to script multiple-step procedures
- using spreadsheets and other interactive approaches for data manipulation
- Copying and pasting results into manuscripts
- Insufficient detail in scientific articles
- No audit trail"
Note the methodology expert makes the classic listing and fails to cite critical issues. Specifically the use of an invalid standard defining the condition under test or an invalid outcome measure. Perhaps she perceives these as outside the realm of statistics. But note she almost gets there citing “Poorly studied high-dimensional feature selection”. This is the edge. The relationship between the clinical condition under test and the math of the testing itself.
This is the classic mistake I have tried to teach here many times. It is (or should become) the statitician’s responsibility to determine the validity (and mathematical integrity) of the clinical criteria (for example, a sum) defining the standard and the outcome measure. Otherwise she risks wasting her effort and indeed embellishing research which will ultimately prove nonreproducible.
The integrity of the math has to pass from the beginning (the standard defining the condition under test) to the end (the outcome measure). That’s the statiticians job… assure the integrity of the math throughout not just the math after the standard and the outcome.
I’ve had push back on this mostly centered around the argument that the standards are the standards and not subject to statitician review. Given that this is likely a valid argument, at least review the source and analyze the reproducibility of the standard and outcome and educate the trialists so they can discuss that in the limitations of the trial.
If anyone knows of a stat book or post (other than by me) with those listed as causes of nonreproducibilty I would welcone the citation.
Measurement is all-important and I harp on that in many places in the BBR notes. I didn’t adequately cover it here so thanks for adding to the discussion.
The point I am making relates to critical care research wherein outcome measurements are often surrogates such as sums of scores responsive to a plurality of threshold breaches. (eg SOFA). This also relates to the standard criteria for the condition which may also be derived from a surrogate measure, as it is for example with sepsis 3 (SOFA of delta 2 plus suspician of infection).
I think statiticians and trialists forget that standard scoring (SOFA. SIRS) are actually measurements. Both are from the 90s so they have developed “gold standard” status. They appear be accepted as above analysis by the statistician and trialist, like mortality itself.
Yet nonreproducibilty is the hallmark of critical care research. There has even been a recent call by NIH suggesting that perhaps funding for RCT for sepsis should be paused because virtually all are nonreproducible over the past 3 decades.
There is a test for nonreproducibility which I call detection of mathematical discontinuity. Here we see the measurement defining the condition under test, the statistical math, and the measurement defining the outcome as a single mathematical process. Any one step of that process can comprise the discontinuity, ie the source of nonreproducibility.
For 30 years statiticians have applied complex statistical functions to measurements derived of simple threshold breaches. The discontinuity would have been clear if the entire formulae set, had been written out on a single line.
This is a type of hidden stuctural bias. It affects all the trials but is entirely undetected by even the most brilliant and diligent.
Imagine a scale which is assumed by all to be the standard and above analysis. However the measurements of the scale change with the temperature. Of course it is obvious the entire math which incorporates outputs of this scale (including the statistical portion) will render non reproducible results because temperature is not included in the formula. However an entire group of diligent scientists could could use this tool for 3O years rendering nonreproducible results if the scale itself was considered THE standard.
In the critical search for the cause of nonreproducibility there are no standards which can be assumed to be ground truth.
“Do not try and bend the spoon, that’s impossible. Instead, only try to realize the truth…there is no spoon.”