How to estimate the sample size when the variance of the outcome is unknown in the population

In a clinical trial comparing two treatments (or treatment vs placebo) on the basis of a numerical outcome, when the variability of the outcome is unknown (mainly in the case of a subject who has not been studied before), how could it be estimated? What is done routinely.

There is no solution that doesn’t make use of a scaling parameter such as the standard deviation, other than using hard-to-interpret non-subject-matter-relevant Cohen’s d and using the Wilcoxon test to detect a certain concordance probability.


Presuming that this is to be a prospective, randomized study, a couple of thoughts:

  1. If you truly have no idea of the potential variance of the measure in the two groups, even using some kind of parallel treatment pathway, depending upon your timeline and budget, you could conduct a small pilot study using the same planned study design, to gain insights into that essential information. This can include insights into any other potential gotchas in the study design, that might require some fine tuning for the larger study. There are a number of publications that provide some guidance on the sample sizes to use for pilot studies, some dependent upon expected treatment effect sizes, others based upon the desired precision of the estimates. Then use the information gained from the pilot study to inform your larger study design.

  2. Put forth some best presumptions for these parameters. Get some sense of the potential influence of a range of assumptions on the sample size required, essentially performing a sensitivity analysis, and depending upon whether you would rather be aggressive or conservative initially, settle on a preliminary sample size to use and move forward with that. I might suggest erring somewhat on the high side, to set expectations for both timelines and budgets. However, also pre-plan on an interim sample size re-estimation at some mid-point (e.g. 50% data collected), where you would be comfortable in the stability of the estimates at that point in time. If your initial sample size estimate is off, then adjust up or down as needed. Depending upon your study design, you may need a protocol amendment at that point. You might review current literature on the implications of taking this approach, and related factors to consider, especially if blinding is involved.

1 Like