I wonder if it would be a good idea to develop a new resource of “medical statistics notebooks” where reproducible R Markdown html documents include analysis of real data using best statistical practice, showing code and carefully accurate interpretations of results. I think this would also assist authors of medical and epi journal articles. These could be managed in Github
where issue reporting/suggestions could also be organized. The notebooks could be organized into categories, e.g.
- single variable analysis (including pre-post studies)
- two-group comparisons without covariate adjustment
- two-group comparisons with covariate adjustment
- multi-group comparisons
- multiple variable descriptive statistics including correlation matrices, variable clustering, principal components analysis
- multiple regression for continuous Y
- multiple regression for binary Y
- multiple regression for ordinal or continuous Y
- propensity score development
- observational treatment comparison using covariate adjustment for propensity score
- observational treatment comparison using matching or weighting on PS
- etc.
If this is worth pursuing I’d also like to think about whether real data should be used as opposed to having a unified way to simulate data to be used. The latter approach has advantages of being self-contained and allowing one to compare results with a known truth.
An early attempt at this is here where several assignments from The Analysis of BIological Data are worked out. BBR course notes also has many worked out examples with code.