I wonder if it would be a good idea to develop a new resource of “medical statistics notebooks” where reproducible R Markdown html documents include analysis of real data using best statistical practice, showing code and carefully accurate interpretations of results. I think this would also assist authors of medical and epi journal articles. These could be managed in Github where issue reporting/suggestions could also be organized. The notebooks could be organized into categories, e.g.
single variable analysis (including pre-post studies)
two-group comparisons without covariate adjustment
two-group comparisons with covariate adjustment
multi-group comparisons
multiple variable descriptive statistics including correlation matrices, variable clustering, principal components analysis
multiple regression for continuous Y
multiple regression for binary Y
multiple regression for ordinal or continuous Y
propensity score development
observational treatment comparison using covariate adjustment for propensity score
observational treatment comparison using matching or weighting on PS
etc.
If this is worth pursuing I’d also like to think about whether real data should be used as opposed to having a unified way to simulate data to be used. The latter approach has advantages of being self-contained and allowing one to compare results with a known truth.
I think this could fill a niche that is not often addressed (properly) in existing resources.
This is a lot of work, but personally, I like examples where simulated and real data are presented alongside. Have no example at hand, but I always thought it was informative to see what a method does on theoretical data. It primed me to at least be much more careful in the interpretation of the results in the real data that followed.
That would be immensely useful. I think it would be useful if we can create those with simulate data as well as real data along with many scenarios as possible. Notebook in R and python can be considered. It will help research community with standardised analytical workflows and interpretion. I am looking forward to such notebooks
It would be feasible, if we agreed on a template and got enough contributors. Perhaps some Python experts could create parallel efforts, converting R Markdown notebooks to Python notebooks.
Most of those are in my opinion specialty analyses. I was thinking more about the most common basic designs (2-sample, k-sample, paired data, regression with binary outcome, regression with continuous or ordinal outcome), etc. I’m not sure you will get many volunteers to do comprehensive specialty analyses, but of course such contributions would be very much welcomed.
Prof, this will be a very useful thing for everyone. I would request you to also consider adding some information on Bayesian analysis and study design also.
Good ideas. I think that for most analysis examples it is best to show Bayesian and frequentist results side-by-side. We have to be sure to educate the reader to expect Bayesian credible intervals to be wider than frequentist confidence intervals that imprudently make assumptions such as normality and equal variance, when the Bayesian model has parameters for non-normality and unequal variance.