Simulating data - reading suggestions

Dear DataMethods community,

I have become a great believer in simulating data in order to pre-specify models and ensure your model can actually model what you’re setting out to. McElreath takes this to its natural conclusion using Stan and R in his book. Harrell talks about its importance in his course. However, apart from the Bayesian approach in McElreath I can’t find a good guide on how best to actually do it. My qestion is: Is there a good book, site or references on the methodology, and pitfalls and pearls? Perhaps McElreath is it, but what if Stan is not an option? Open to any suggestions at all. Kind regards.

2 Likes

this book by Gelman et al. has a section on this with some examples.

3 Likes

For pure programming stuff see R Workflow - 18  Simulation

1 Like

“Using simulation studies to evaluate statistical methods” by Morris et al. might be the kind of thing you’re looking for (https://onlinelibrary.wiley.com/doi/10.1002/sim.8086).

3 Likes

The DeclareDesign book has some worked examples with code in the latter half.

2 Likes

If you are working in R and want to create simulated data sets based on actual data, then I’ve found the package synthpop quite useful.

3 Likes

Replying to my own message may be a faux-pas, but I’ve discovered a couple of resources that may be of use to others on this journey:

The Simulation Summer School with videos here: https://www.youtube.com/playlist?list=PLvv6KTS5tb3pHALXIq0mSwHnZY1SPIBLR

AND

Andrew Heiss’s web course here: Program Evaluation - The ultimate guide to generating synthetic data for causal inference which has a causal focus.

Hope these are helpful for others

Ross

1 Like