We would like to get your feedback on a preprint we’ve written on how the statistical analysis approach in a randomised trial should be pre-specified: https://arxiv.org/ftp/arxiv/papers/1907/1907.04078.pdf
We wrote this paper as a response to what we perceived as a major problem in the way stats analysis sections in trial protocols were being written: for example, reviews of protocols have found that around 11-20% do not say what statistical model they plan to use to analyse the primary outcome, 42% do give the model, but not enough detail to implement it, and 19% let the investigators choose their analysis model after seeing the trial data.
Our concern is that these protocols are written in a way that essentially allows the investigators to wait until they get their data, run a number of different analyses, and then choose the one that gives the answer they want.
Our preprint is gives a set of rules for how we think analysis strategies should be pre-specified; these rules are designed to ensure that statistical methods cannot be chosen after seeing trial data in order to get a more favourable result (i.e. to limit the possibility of p-hacking, or at least to help people identify when p-hacking occurred). These rules are mainly based on what is in the SPIRIT and ICH-E9 guidelines. Our hope is that this checklist could be used to help people design an analysis strategy for their own RCT, or used to check whether published protocols/RCTs may be at risk of bias from lack of appropriate pre-specification.
The five rules are:
Pre-specify the analysis methods before recruitment to the trial begins;
Specify a single primary analysis strategy (if multiple analyses are planned, one should be identified as the primary);
Plan all aspects of the analysis, including the analysis population, statistical model, use of covariates, handling of missing data, and any other relevant aspects (eg prior);
Enough detail should be provided so that a third party could independently perform the analysis (ideally this could be achieved by providing the planned statistical code); and
Adaptive analysis strategies which use the trial data to inform some aspect of the analysis should use deterministic decision rules
The rationale and further description of these rules is available in the pre-print (https://arxiv.org/ftp/arxiv/papers/1907/1907.04078.pdf); there’s also an example of a trial with a problematic protocol, and how these rules could be used to fix the issues. This is still a work in progress, so we would really like your feedback on it; does it make sense, what needs further clarification/explanation, what needs fixing, etc. Thanks!