I am looking at computing the Hodges-Lehman (median of pairwise differences) estimate with 95% confidence intervals to compare condition-specific medical costs over a few populations. For some of the populations, the sample size is quite large (thousands) and for some of the populations, the sample size is quite small (2-3). As with any inference tool, the result is only as good as the assumptions. In this case, my concern is the bootstrap assumption for the small populations with sample sizes of 2-3. For the most part I expect the distribution of the outcomes might be somewhat similar across groups (maybe). Iâ€™ve looked at this distribution for the larger samples and a lognormal assumption isnâ€™t too bad in 50% of the cases. Iâ€™m considering a few options:

Just do the bootstrap confidence intervals and only report those (maybe include some statement that results with sample sizes < 6 should be considered with skepticism)

Only try comparing populations where both groups have at least ~6 observations (or some other number of observations)

Take the logarithm, analyze that as if normally distributed, run a simulation on the model (+ model uncertainty) to get Hodges-Lehman estimate and uncertainty. Report both bootstrap and normal-based estimates and CI for the small sample size cases and suggest readers focus on the larger of the two.

Any suggestions concerning which of these options seems preferable (or if other options) would be appreciated.

I was hoping some other scholars would have gotten to this before me. Iâ€™ve taken an interest discovering what, in principle, can be learned from small sample research. Iâ€™ve had to resign myself to the following conclusions:

Consider a Bayesian approach with a prior distribution.

Collect more data for frequentist methods.

Given you have substantial data in some cases, have you considered an Empirical Bayes approach, where you estimate a prior distribution, then apply it to a relevant subset? That would be be a reasonable Bayes/Frequentist compromise to extract some use of the very small samples.

The Wikipedia link has some good papers in the reference section.

I think this a very good question. Iâ€™ve had similar issues in the past. Iâ€™ve tried CART or similar simulation methods instead of bootstrapping (I used the Synthpop packate in R: B. Nowok, G.M. Raab, C. Dibben, synthpop: Bespoke Creation of Synthetic Data in R, J. Stat. Soft. 74 (2016) 1â€“26. https://doi.org/10.18637/jss.v074.i11.). However, with 2 or 3 only in a sample size your option 2 may still be better.

Correct me if Iâ€™m wrong, but I think empirical Bayes is for the case where you have lots of data that give you strong ability to estimate a prior, and this requires clustering. For example if you have 10 measurements on each of 1000 subjects you can do effective empirical Bayes analysis where you estimate the variance of random effects (subject-specific intercepts).

Iâ€™d be one of the last people to say you were wrong about an applied stats question. The more I study the underlying math, the more sense your suggestions make.

I think your quote is an excellent description of Bradley Efronâ€™s original papers on Empirical Bayes and MLB batting averages. He has extended the Empirical Bayes method in some recent books under the title Large Scale Inference that Iâ€™m wrestling with.

I like Empirical Bayes because it can help critics get over the (non)-issue of â€śsubjective priorsâ€ť, while still being rigorous in the synthesis of prior information.

It seemed those techniques applied to this problem, as a number of samples from different populations were estimated â€śsimultaneouslyâ€ť (ie. within the same study). It appears that there was prior work done way back in 1991 by the Census Department using Empirical Bayes, when estimating median incomes in various regions.

Datta, G, Fay, R, Ghosh, M. (1991) Hierarchical and Empirical Multivariate Bayes Analysis in Small Area Estimation (link)

Ghosh, M. and Pathak, P. eds. (1992) Hierarchical and Empirical Multivariate Bayes Analysis (1992) Current issues in statistical inference: Essays in honor of D. Basu (Hayward, CA: Institute of Mathematical Statistics, 1992), 151-177 (link)

Do you see any flaws in the use of this method that I might have missed?

Just a general comment about empirical Bayes. It is approximate, and sometimes the approximation doesnâ€™t work well enough. And it may not take uncertainty fully into account in estimating variance of random effects. Spiegelhalter had a paper showing the better performance of full Bayes then.