Power analysis in observational studies

Does anyone have a good reference about power analysis in observational studies, mostly retrospective ones?
I have read conflicting opinions on this topic.

Sorry, but that makes almost no sense. Why exactly do you want to do a power analysis on observational data that are retrospective? Power analyses can be useful in certain contexts like when designing an experiment/trial and where there is random assignment and ability to replicate studies, but I don’t understand why anyone would do it for retrospective data, that just sounds like calculating observed power, for that discussion and why it’s a bad idea, see here please

1 Like

Hi,

I can tell you, from the perspective of one who has been involved in observational studies for the better part of 40 years, I have never conducted an a priori power/sample size calculation for any of them.

There have been subsets of them, but a minority, where we pre-defined the precision of the estimates that we will likely achieve, at given prevalences, using the width of 95% confidence intervals for binary variables. Those are easy calculations to make, once you have a sense of the likely sample size that you can reasonably obtain.

Sample sizes in observational studies are typically based upon “convenience samples”. That is, given the various constraints of the prevalence of the disease and/or treatment that you are focused upon, the inclusion/exclusion criteria, the timeline for enrollment (in the prospective setting), whether you are restricted to a single study site or multiple sites, and the budget for the study, how many patients can be reasonably included?

In the retrospective setting, you have the additional challenge of dealing with the quality and completeness of the data that you can obtain, since you are entirely dependent upon what was or was not recorded in your available data sources in the past. There may be other per-patient variations in terms of diagnostic testing that was or was not done, based upon clinical practice at the time. Further, you may also have to deal with inter-observer variability in how qualitative data may have been recorded, in the absence of clear standards and definitions for these, as would typically be the case in a prospective study.

Each of those factors can impact your available sample size, if your inclusion criteria require that certain data be available for inclusion and analysis.

You may wish to consider conducting a small pilot study, to get a sense for those issues and how they may impact your study and any relevant design changes, before engaging in the full study.

1 Like

The terminology here may be the problem. The original post really isn’t very clear about the intention so it’s hard to give useful advice about such generic terms. But I can think of one place where some degree of power calculation or sample size planning could be performed but is often not.

Suppose that a cardiology fellow is about to embark on a chart review project. Most would consider this “retrospective” even if the data collection process will be prospective, e.g. we will sit down and outline what to collect and then the person may go and collect the data. In many cases, as alluded by another poster on the thread, this just turns into some form of a convenience sample, e.g. “we abstracted data for all the cases we had available” or “we abstracted the cases from the last 5 years”

In this setting, I often advocate for some thought about the question(s), effect size(s) of interest, and number of cases that would be needed to detect those effects before even setting off on the project, for two reasons. The first is that if it’s completely hopeless to find anything useful (e.g. would need 500 cases and there are only 50 available) they can try to redirect their efforts to another, potentially better project; the second is that they might be able to efficiently allocate their efforts (e.g. if we determine that 500 is a good sample size for the project, they can review exactly 500 charts and no more rather than trying to do all 4,000 cases available).

Again, hard to answer the initial query since it is so vague, but I don’t entirely dismiss the idea of power in “retrospective studies” because it’s dependent on, well, this kind of stuff.

5 Likes

I see your point. I just read the attached post and I got to say… what a fruitful discussion! I learned a lot, thank you.

Those are great expert insights. I’m currently on the other side: preparing for my first clinical research project, which will probably be a retrospective cohort. Thank you so much!

Sorry, I was indeed very vague. Your example is interesting, but I see it as a fully prospective study. In that case, a power analysis is essential, because as you said, investigators need to allocate time and money in a more efficient way.

If you are planning your first observational study, the EMA recently issued a very detailed draft guidance on registry studies, which is worth reading (https://www.ema.europa.eu/en/guideline-registry-based-studies#draft-under-public-consultation-section). AHRQ also publishes a periodically updated guideline for registries (https://effectivehealthcare.ahrq.gov/products/registries-guide-4th-edition/users-guide). It is important to understand that a prospective, observational registry study doesn’t generally have a sample size based on power calculations. Many registries are descriptive, reporting on “what are the outcomes of this type of patients”, “how are patients with this disease treated,” or “what are the demographics of people diagnosed with X.” You still might want to do some sample size calculations in order to understand how many patients will be required and how long the study will need to run.

2 Likes

These look great! I’ll definitely have a look, thank you

I am transitioning from epidemiology to biostatistics and this is my take: All epidemiologic investigation proposals must have power analysis. Period.

I have observed unfortunately that this is not the usual practice in the pharma/industry arena where I sense the “observational study bucket” is meant to broadly capture general descriptive studies. Most often, these studies are conducted without thorough considerations for methodologic issues and best practices.

It will be of limited value to execute an observational/ epidemiologic study without assessing the sample or power needed to appropriately observe the direction and size of the effect of interest before sinking in resources.

I’ll have to respectfully disagree. Power is problematic even for randomized clinical trials because it is dependent on (1) null hypothesis tests with artificial thresholds and (2) only one possible effect size to detect. And that is the best situation for power. In observational studies power is even more problematic. In either type of study it is more relevant to either (1) quantify the total information content in the data, e.g. effective sample size, or (2) estimate the margin of error in estimating the main quantity of interest (e.g. an exposure odds ratio).

Before a study is done or the sample is completed we can estimate an expected margin of error. After the sample is complete we just compute the actual margin of error. I take the term margin of error a bit loosely to be half of the width of a 0.95 compatibility interval or Bayesian highest posterior density interval.

1 Like