Sample size consideration for establishing a reference interval

is there any formula for calculation of sample size for establishing reference interval. reference levels of thyroid function tests will be different in different ethnic group. we would like to find out the reference level for different trimesters of pregnancy in our population.

Reference intervals, being not risk based but rather sample-based, are highly dependent on how the sample is chosen. That is more important than anything else. There are sample size formulas available; I just don’t know the references. If you want to be distribution-free, the Harrell-Davis quantile estimator is recommended (shameless advertisement) and this requires something like 300-400 subjects per homogeneous sample. The sample size will depend on the acceptable margins of error in estimating the quantiles of interest.

Being not risk based, reference intervals are not consistent with medical decision making. One of the many ways problems with them arise is that one can have a lab value near the upper (or lower) limit of normal, and a patient at that level can have elevated disease risk unbeknownst to the physician or patient.


I agree with Frank that the reference intervals are sample-based. The Clinical and Laboratory Standards Institute (CLSI) recommends a minimum of 120 subjects per a homogeneous sample. This sample size is based off knowing that your data follows a normal distribution.

An introduction to reference interval analysis that I am most familiar with is CLSI’s “EP28-A3c EP28 Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory”. This is the resource that my company (IDEXX - bio-marker discovery/ medical diagnostics) uses.

I have found some of the recommendations in the CLSI document a little concerning (such as how to handle outliers, small sample-size reference intervals, and their suggestion for robust reference interval calculations is not really that robust, etc.).


Thanks prof. We will read more about Harrell Davis quantile estimator and will implement that. Thanks again

Thanks. Let me go through those articles

Please edit your earlier reply instead of adding separate consecutive replies.

Donald’s advice is excellent. I’ll just add that it’s a good idea not to assume normality of lab measurements, hence the need for sample sizes larger than 120. The Harrell-Davis estimator (see the R Hmisc package hdquantile function) is a little more efficient than the ordinary nonparametric sample quantile estimator, and converges to it as n \rightarrow \infty.

1 Like

Hi proff, i couldn’t find any option to edit the first reply. So i deleted second reply. I see the option to edit only in the orginal first post. And now whole post is seen washed out… Kindly forgive my ignorance

FH: sorry the edit privilege maybe only starts when you have more posts under your belt.