R Notebooks for Teaching Biomedical Researchers

f2harrell · January 7, 2021, 1:10pm

I wonder if it would be a good idea to develop a new resource of “medical statistics notebooks” where reproducible R Markdown html documents include analysis of real data using best statistical practice, showing code and carefully accurate interpretations of results. I think this would also assist authors of medical and epi journal articles. These could be managed in Github where issue reporting/suggestions could also be organized. The notebooks could be organized into categories, e.g.

single variable analysis (including pre-post studies)
two-group comparisons without covariate adjustment
two-group comparisons with covariate adjustment
multi-group comparisons
multiple variable descriptive statistics including correlation matrices, variable clustering, principal components analysis
multiple regression for continuous Y
multiple regression for binary Y
multiple regression for ordinal or continuous Y
propensity score development
observational treatment comparison using covariate adjustment for propensity score
observational treatment comparison using matching or weighting on PS
etc.

If this is worth pursuing I’d also like to think about whether real data should be used as opposed to having a unified way to simulate data to be used. The latter approach has advantages of being self-contained and allowing one to compare results with a known truth.

An early attempt at this is here where several assignments from The Analysis of BIological Data are worked out. BBR course notes also has many worked out examples with code.

scboone · January 8, 2021, 10:05am

I think this could fill a niche that is not often addressed (properly) in existing resources.

This is a lot of work, but personally, I like examples where simulated and real data are presented alongside. Have no example at hand, but I always thought it was informative to see what a method does on theoretical data. It primed me to at least be much more careful in the interpretation of the results in the real data that followed.

EpiLearneR · January 11, 2021, 6:47pm

That would be immensely useful. I think it would be useful if we can create those with simulate data as well as real data along with many scenarios as possible. Notebook in R and python can be considered. It will help research community with standardised analytical workflows and interpretion. I am looking forward to such notebooks

f2harrell · January 11, 2021, 7:42pm

It would be feasible, if we agreed on a template and got enough contributors. Perhaps some Python experts could create parallel efforts, converting R Markdown notebooks to Python notebooks.

dpananos · January 12, 2021, 9:04pm

Python notebooks sound like a great idea. I’d volunteer some of my time to recreate R analyses in python if needed.

Sapiens · January 13, 2021, 10:40am

This a great idea. However, most of us are here to learn and so the only thing we can do is to provide real data if needed.

f2harrell · January 13, 2021, 12:52pm

Yes I think that valuable contributions from non-statisticians will include

which types of study designs and measurements need to be exemplified
questions about interpretations of results that will lead to augmentation of the interpretations

EpiLearneR · January 13, 2021, 2:10pm

Designs

cross sectional study
case control study- both matched and unmatched
case cohort
nested case control study
cohort study
RCT
Cost effectiveness,cost utility , cost benefit

Diagnostic test evaluaton with gold standard and with imperfect gold standard

Multivariable prediciton modelling , development, updating , validation

Scale development

Causal modelling

One request I have is to provide bayesian analysis as well as the frequentist , both given in a standardised way.

f2harrell · January 13, 2021, 4:02pm

Most of those are in my opinion specialty analyses. I was thinking more about the most common basic designs (2-sample, k-sample, paired data, regression with binary outcome, regression with continuous or ordinal outcome), etc. I’m not sure you will get many volunteers to do comprehensive specialty analyses, but of course such contributions would be very much welcomed.

Sapiens · January 13, 2021, 7:36pm

Logistic regression is in my opinion very important…

S_Chakraborty · January 17, 2021, 3:23pm

Prof, this will be a very useful thing for everyone. I would request you to also consider adding some information on Bayesian analysis and study design also.

f2harrell · January 17, 2021, 4:29pm

Good ideas. I think that for most analysis examples it is best to show Bayesian and frequentist results side-by-side. We have to be sure to educate the reader to expect Bayesian credible intervals to be wider than frequentist confidence intervals that imprudently make assumptions such as normality and equal variance, when the Bayesian model has parameters for non-normality and unequal variance.

S_Chakraborty · January 17, 2021, 5:00pm

I was wondering a comparison of traditional survival analysis against ordinal multiple regression would be useful to have all.

f2harrell · January 17, 2021, 9:09pm

I’d like to do a comparison of survival analysis with ordinal longitudinal modeling as a separate exercise, e.g. a blog article.

S_Chakraborty · January 18, 2021, 2:33pm

Couple of more topic ideas

Analysis of longitudinal PROM data
Imputation techniques for missing data.