I am looking at computing the Hodges-Lehman (median of pairwise differences) estimate with 95% confidence intervals to compare condition-specific medical costs over a few populations. For some of the populations, the sample size is quite large (thousands) and for some of the populations, the sample size is quite small (2-3). As with any inference tool, the result is only as good as the assumptions. In this case, my concern is the bootstrap assumption for the small populations with sample sizes of 2-3. For the most part I expect the distribution of the outcomes might be somewhat similar across groups (maybe). Iâ€™ve looked at this distribution for the larger samples and a lognormal assumption isnâ€™t too bad in 50% of the cases. Iâ€™m considering a few options:

- Just do the bootstrap confidence intervals and only report those (maybe include some statement that results with sample sizes < 6 should be considered with skepticism)
- Only try comparing populations where both groups have at least ~6 observations (or some other number of observations)
- Take the logarithm, analyze that as if normally distributed, run a simulation on the model (+ model uncertainty) to get Hodges-Lehman estimate and uncertainty. Report both bootstrap and normal-based estimates and CI for the small sample size cases and suggest readers focus on the larger of the two.

Any suggestions concerning which of these options seems preferable (or if other options) would be appreciated.