Scott, I stumbled upon a nice blog post and thread by Jonathan Bartlett on *combining bootstrapping with multiple imputation*: https://thestatsgeek.com/2016/03/12/combining-bootstrapping-with-multiple-imputation/.

The blog post mentions an article which I believe will be of direct interest to you: **BOOTSTRAP INFERENCE WHEN USING MULTIPLE IMPUTATION** by By Michael Schomaker and Christian Heumann. A preprint of the article is available here: https://arxiv.org/pdf/1602.07933.pdf. The article itself was published in Statistics in Medicine in 2018: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.7654.

The article uses both theoretical considerations and Monte Carlo simulations to investigate and compare 4 methods of combining bootstrapping with multiple imputation. The goal is to see which of these mehods yield valid inference when used in the context confidence interval construction, i.e. if the actual coverage level of the confidence interval equals the nominal coverage level.

The first two methods in the article are of the **Impute first, Bootstrap next** variety: perform multiple imputation first to resolve data missigness, then apply bootstrapping to each imputed data set. These are referred to in the article as **MI Boot (PS)** and **MI Boot**, where PS stands for Pooled Sample.

The last two methods are of the **Bootstrap first, Impute next** variety: apply bootstrap first (without resolving data missingness), then perform multiple imputation for each bootstrapped data set. These are referred to as **Boot MI (PS)** and **Boot MI** in the article.

For their simulations, the authors note that “*the computation time for* **Boot MI** *was always greater than for* **MI Boot**”.

Furthermore, the authors suggest that:

“*General comparisons between MI Boot and Boot MI are difficult because the within and between imputation uncertainty, as well as the within and between bootstrap sampling uncertainty, will determine the actual width of a confidence interval.*”

In the article conclusion, the authors state the following:

*"The current statistical literature is not clear on how to combine bootstrap with multiple imputation inference. We have proposed that a number of approaches are intuitively appealing and three of them are correct: Boot MI, MI Boot, MI Boot (PS). Using Boot MI (PS) can lead to too large and invalid confidence intervals and is therefore not recommended.*

*Both Boot MI and MI Boot are probably the best options to calculate randomization valid confidence intervals when combining bootstrapping with multiple imputation. As a rule of thumb, our analyses suggest that the former may be preferred for small M or large imputation uncertainty and the latter for normal M and little/normal imputation uncertainty.*

*There are however other considerations when deciding between MI Boot and Boot MI. The latter is computationally much more intensive. This matters particularly when estimating the analysis model is simple in relation to creating the imputations. In fact, in our first simulation this affected the computation time by a factor of 13. However, MI Boot naturally provides symmetrical confidence intervals. These intervals may not be wanted if an estimator’s distribution is suspected to be non-normal."*

The article also presents a real-world data analysis example where g-formula inference is involved (a setting similar to yours).

I hope this article will provide you with the guidance you need to proceed in your situation.