Power and Sample Size Calculations in Pilot Studies



This happens from time to time: I work with someone that is putting together an application for a pilot study, usually with a funding mechanism / budget that will only support enrollment of a few dozen subjects that is clearly meant to be a stepping stone. Yet, the PI often feels obligated to write something about “Power” even though I feel a better tack would be simply justifying what we will be able to learn from the proposed sample size (which will almost never be anything related to the primary endpoint; rather, we will typically learn about the feasibility of scaling upwards to a larger study, enrollment challenges, things about our forms and questionnaires and visit schedule). I am simply curious what other people feel about this, or what others have done in similar situations.


i’ve had several such examples recently. One was nominally labelled a ‘pilot study’ and there was no desire for a power calculation. I work in pediatrics where small samples are not unusual; despite the dozen or so patients, they expect the pilot to be publishable. I think we edged away from statistical testing so there would be nothing to base a power calculation on anyway (it was clear in their minds that assessing feasibility was the point). In the 2nd example i was approached after they had submitted to a journal (or were about to submit). It wasn’t a pilot but the same issue existed - they were required to say something about power. Their paper reported many spearman correlations so i simply determined the CI width obtainable given the sample size and then declared ‘this is reasonable’ (i think i also suggested the term ‘convenience sample’). The 3rd case it was also a pilot study (although it wasn’t presented that way, i think we called them ‘preliminary results’, we wanted data to inform a power calculation for a subsequent study etc). The conference organisers were demanding a restrospective power calculation. I told them to refuse and how to word the refusal. Very little consistency across these examples and I’m not saying I handled them perfectly :slight_smile: It depends on who’s demanding the words re power. In your case, because it’s the researcher i’d edge away from it; i’d share with them another proposal or paper you’ve done for a pilot where no power calculation was described. There’s a tendency to do what was done before

edit: the other idea is to do the power calculation for them, show them the power is inadequate and then ask if they still want to report it in the proposal


In such situations, I specifically focus on feasibility metrics given that the pilot study is being conducted to assess the feasibility of the project and to scale to a bigger study. Some FOAs specifically instruct the reviewers to ‘take off points’ if a power analysis is provided. The important thing is to clearly define the feasibility metric and define success of the study based on those metrics.


I think there is some value in power calculations generated by simulation, even in the setting of a pilot study. The value is NOT the actual power or sample-size calculations (which are often hogwash for larger studies as well). The value of constructing a simulation is that it helps flesh-out the details of the analysis plan. It requires the research team to make decisions about the type of model, the form of the covariates, and the method of inference. Constructing a simulation helps the research team understand the best- and worst-case scenarios.

Should power calculations be included as part of a pilot study grant application? Probably not. However, I think constructing a simulation is a helpful tool for thinking through the details of the analysis plan.


Besides fully emphasizing the feasibility aspects, if statistical details are really required I try to avoid power calculations for pilot studies and just do precision calculations. For example I compute for the main response variable the likely margin of error (half-width of 0.95 confidence interval) that would result with the planned sample size. See here for example.


Thanks very much to all that have responded.

Good for them! If the PI well and truly understands this, the problem I describe seems fairly moot.

Ah, this old chestnut. I have also seen journals with well-meaning requirements that the authors must make a statement about the “power” that have enforced this in situations where it really ought not to apply. Such as:


Neither do I claim to have handled just about anything perfectly. In many cases, I simply hope that my contribution kept things from being worse than it would’ve been without my contribution.

The “tendency to do what was done before” is very real, and quite harmful to progress (because it seems to discourage actual thought about what’s appropriate for the current proposal). In some cases I simply don’t have a “what was done before” example to use (I’m only five years post-PhD, and so may not have quite the portfolio that others have amassed).

Rameela, very glad to see you contributing here! I hope you will be around for further discussions.

Completely agreed that this is the most appropriate solution for pilot study applications. It is interesting that some instruct reviewers to penalize applications that provide an inappropriate power analysis.

I cannot disagree with any of that!

I will start pushing a bit harder for this on any future applications.

Many thanks to all discussants.


The journal wants something about power

This may be time to tell the journal what a pilot study is. The power to detect what it’s intended to detect, in the case of a pilot study, is the power to detect errors, omissions, potential improvements in the protocol. So its power depends on the methods used to do these things.

On the other hand, I have seen investigators try to do a small, underpowered study and excuse it as a ‘pilot’.