The role of simulation in developing analysis plans



I have become increasingly interested in the role of simulation in developing plans for analysis. I first came across the idea when reading about estimating frequentist properties of Bayesian procedures, and now more recently see it being advocated as a routine part of model building in the Bayesian context (e.g. over at the Stan forums specifically in the context of making sure your code works as you intend it to). This procedure makes a lot of sense to me:

  1. Specify your data generating model (common to frequentist and Bayesian)
  2. Simulate parameters and data from that model that reflect your planned data collection strategy
  3. Assess whether your proposed model can recover the simulated parameters with some predetermined measure of accuracy
  4. (optional) evaluate how potential problems might undermine your goals and evaluate the robustness of alternative models

It seems to me that if you did this at the outset of your trial then you would:

  1. Know exactly how you need to set up your data collection forms
  2. Be confident that your code correctly returns the true value given your assumptions
  3. Be able to assess plausibility that violation of those assumptions would change your findings
  4. Directly investigate issues related to what-ifs (e.g. if a client is adament about using propensity score matching, or stepwise regression, or AIC for model selection etc… etc…)

I suppose the easy answer is to trust published simulations, or that this is simply too much work. On its face it appeals to me though in terms of addressing some of the clinical arguments that have come up here in the past (e.g. dichotomania, importance of regularizing coefficients in highly dimensional settings, hazards of stepwise procedures). Maybe all we need for some of these is a collection of citations to simulations related of major issues?

I’m almost certain the statisticians and trialists here will think this just seems like an obvious thing, and yet I can’t recall working with anyone at our (admittedly small) centre that takes this approach.

An example I am working on now is a protocol for a review where we will be specifying a stochastic loss function across outcomes for use in decision making. Due to time constraints this isn’t something we can do while meeting deadlines, but it seems like it would be entirely worthwhile to look ahead at how a p-value based decision rule would conflict with what we are proposing.

Analysis QA - Best practices?

I wouldn’t have phrased the procedure the way you did. Or at least that is not the procedure often discussed on the Stan discourse, which is (omitting some details)

  1. Draw parameter realizations from the (proper) prior
  2. Draw data realizations from the conditional distribution of the data given the parameter realizations in the previous step
  3. Condition on the data realizations in the previous step to obtain a posterior distribution (or approximation thereof)
  4. Evaluate the order statistic of where the realizations from step 1 fall in step 3

If you repeat that process many times, then the order statistics should be uniformly distributed if the software is working properly and the joint distribution of the parameters and the data is amenable to the software.

I am not surprised that this procedure has not been used often. You could say that the paper is very recent, but the similar approach that arose out of Samantha Cook’s dissertation is more than 12 years old now and it was not used that much either. The approach is very Bayesian, so it would have no merit from a frequentist perspective. And many of the approximate Bayesian methods fare poorly under this procedure, which does not help its popularity.


Thank you @bgoodri for providing a more specific summary of simulation based calibration approach. I would agree that this is a very Bayesian application, but hoping here to also discuss similarities with a more general simulation first approach as well. I remember Bob and Andrew specifically suggesting that starting from simulations helps with understanding analysis in more general terms. For example, even a frequentist analysis that starts with simulation could reveal issues with specific applications (or even help QA code).

Looking forward to applying both SBC and Bayesian workflow papers to analyses in my dissertation pipeline (although I hold a secret fear that the BUGS code we’ve all been using isn’t actually working as advertised in all situations).


I tend to think of the ‘optional’ step 4 as the most interesting and important aspect of the simulation-based planning you describe, Tim. It seems closely related to @Sander’s multiple-bias modeling [1]. Am I wrong in drawing that connection?

Also, I recall a relevant session from an ASA meeting in recent years, including a discussion of the demands that such simulation-based analyses place on the FDA, and the FDA’s efforts to gear up. I think it might have been this one from JSM 2017.

  1. Greenland S. Multiple-bias modelling for analysis of observational data (with discussion). J R Stat Soc A. 2005;168(2):267-306. doi:10.1111/j.1467-985X.2004.00349.x

Inverse probability weighting for treatment selection and loss to follow up in observational studies

I would add to @bgoodri’s answer regarding the Bayesian case that there is IMHO a lot to be learned from simulations for basically any type of analysis. And while SBC is a great tool, I would encourage you to do simulations even if you cannot/do not want to go full Bayes. It is IMHO good practice and you should not avoid doing simulations (or other checks for that matter) just because you cannot do them perfectly.

My guess why simulations are not that popular is that it is often a considerable amount of work to write the simulations and verify that their work. Even larger burden is figuring out which of the myriad combinations of possible assumptions to actually run simulations for and in interpreting the large datasets of simulation results. Finally, you will (like almost 100%) identify problems with your code and/or general approach and will have to address them. Just running the analysis once on the final data and hope for the best is waaaaay easier and you get to publish more often.

I would be slightly skeptical of a possibility for a list of citations related to simulations for major issues as there are so many things that can go wrong and your analysis is quite likely to have a problem-specific element, so custom simulations would generally suit you better.

Besides testing your code and correctness (which is a great goal) simulations also let to understand your power (in a broad sense of the word) - regardless of the complexity of your model/preprocessing, … Especially you get to learn how often would you expect your analysis with a given sample size to provide inconclusive results and how often you would expect misleading results (e.g. obeserved effect in the opposite direction than the true effect). Running simulations in the design phase lets you avoid wasting money and resources on studies that are highly likely to not be able to answer the question you are asking. This - for me - is a major selling point of simulations.


Simulation is a lot of work. But I’ve found that by saving every simulation program I write, simulation gets easier over time. Now embedding the code in R Markdown html reports things are even better documented. I also try to bookmark published papers that seem to contain useful simulation setups here.


What a great couple of resources, thank you for sharing. Absolutely the linked Greenland paper is what I was thinking in step 4.


not sure if you’re aware, but their is a growing literature on “clinical trial simulation” [clinical trial simulation: a review], even the fda is promoting it [In silico clinical trials use computer models and simulations] and software is focusing on it [matlab] and cro’s are offering services [Using simulation to optimize adaptive trial designs]. … And here’s an advertisement: we recently published a paper promoting simulations at the design stage: paper