I wonder if it would be a good idea to develop a new resource of “medical statistics notebooks” where reproducible R Markdown html documents include analysis of real data using best statistical practice, showing code and carefully accurate interpretations of results. I think this would also assist authors of medical and epi journal articles. These could be managed in
Github where issue reporting/suggestions could also be organized. The notebooks could be organized into categories, e.g.
- single variable analysis (including pre-post studies)
- two-group comparisons without covariate adjustment
- two-group comparisons with covariate adjustment
- multi-group comparisons
- multiple variable descriptive statistics including correlation matrices, variable clustering, principal components analysis
- multiple regression for continuous Y
- multiple regression for binary Y
- multiple regression for ordinal or continuous Y
- propensity score development
- observational treatment comparison using covariate adjustment for propensity score
- observational treatment comparison using matching or weighting on PS
If this is worth pursuing I’d also like to think about whether real data should be used as opposed to having a unified way to simulate data to be used. The latter approach has advantages of being self-contained and allowing one to compare results with a known truth.