R Notebooks for Teaching Biomedical Researchers

I wonder if it would be a good idea to develop a new resource of “medical statistics notebooks” where reproducible R Markdown html documents include analysis of real data using best statistical practice, showing code and carefully accurate interpretations of results. I think this would also assist authors of medical and epi journal articles. These could be managed in Github where issue reporting/suggestions could also be organized. The notebooks could be organized into categories, e.g.

  • single variable analysis (including pre-post studies)
  • two-group comparisons without covariate adjustment
  • two-group comparisons with covariate adjustment
  • multi-group comparisons
  • multiple variable descriptive statistics including correlation matrices, variable clustering, principal components analysis
  • multiple regression for continuous Y
  • multiple regression for binary Y
  • multiple regression for ordinal or continuous Y
  • propensity score development
  • observational treatment comparison using covariate adjustment for propensity score
  • observational treatment comparison using matching or weighting on PS
  • etc.

If this is worth pursuing I’d also like to think about whether real data should be used as opposed to having a unified way to simulate data to be used. The latter approach has advantages of being self-contained and allowing one to compare results with a known truth.

An early attempt at this is here where several assignments from The Analysis of BIological Data are worked out. BBR course notes also has many worked out examples with code.


I think this could fill a niche that is not often addressed (properly) in existing resources.

This is a lot of work, but personally, I like examples where simulated and real data are presented alongside. Have no example at hand, but I always thought it was informative to see what a method does on theoretical data. It primed me to at least be much more careful in the interpretation of the results in the real data that followed.


That would be immensely useful. I think it would be useful if we can create those with simulate data as well as real data along with many scenarios as possible. Notebook in R and python can be considered. It will help research community with standardised analytical workflows and interpretion. I am looking forward to such notebooks


It would be feasible, if we agreed on a template and got enough contributors. Perhaps some Python experts could create parallel efforts, converting R Markdown notebooks to Python notebooks.


Python notebooks sound like a great idea. I’d volunteer some of my time to recreate R analyses in python if needed.

This a great idea. However, most of us are here to learn and so the only thing we can do is to provide real data if needed.

Yes I think that valuable contributions from non-statisticians will include

  • which types of study designs and measurements need to be exemplified
  • questions about interpretations of results that will lead to augmentation of the interpretations


  • cross sectional study

  • case control study- both matched and unmatched

  • case cohort

  • nested case control study

  • cohort study

  • RCT

  • Cost effectiveness,cost utility , cost benefit

Diagnostic test evaluaton with gold standard and with imperfect gold standard

Multivariable prediciton modelling , development, updating , validation

Scale development

Causal modelling

One request I have is to provide bayesian analysis as well as the frequentist , both given in a standardised way.

Most of those are in my opinion specialty analyses. I was thinking more about the most common basic designs (2-sample, k-sample, paired data, regression with binary outcome, regression with continuous or ordinal outcome), etc. I’m not sure you will get many volunteers to do comprehensive specialty analyses, but of course such contributions would be very much welcomed.

1 Like

Logistic regression is in my opinion very important…

Prof, this will be a very useful thing for everyone. I would request you to also consider adding some information on Bayesian analysis and study design also.

Good ideas. I think that for most analysis examples it is best to show Bayesian and frequentist results side-by-side. We have to be sure to educate the reader to expect Bayesian credible intervals to be wider than frequentist confidence intervals that imprudently make assumptions such as normality and equal variance, when the Bayesian model has parameters for non-normality and unequal variance.


I was wondering a comparison of traditional survival analysis against ordinal multiple regression would be useful to have all.

I’d like to do a comparison of survival analysis with ordinal longitudinal modeling as a separate exercise, e.g. a blog article.


Couple of more topic ideas

  1. Analysis of longitudinal PROM data
  2. Imputation techniques for missing data.