# Meta-analysis with incomplete data

Hello Statisticians,

I’m doing a meta-analysis comparing infection rates in 2 sub-types of liver operations (say Big surgery Vs Small surgery). I have a few studies which compare the 2 but other studies that only report on infection rates in one sub-type, either Big or Small.

I want to do a meta-analysis where I sum the infection numbers in each sub-type and calculate a cumulative OR with overall p-value. When adding these absolute values, is it ok/correct to include those studies that don’t compare and only report on one sub-type?

For example:

Then calcualte OR, CI and p-value on the totals.

2 Likes

Good question.

If I understand what you want to do, you want to sum the values in each column (ie. pooling), and treating the data as if it was one large, study. Is that correct?

That is a seductive method, but probably wrong according to this article.

Simple Pooling versus Combining in Meta-Analysis
Dena M. Bravata, Ingram Olkin (2001) Evaluation in the Health Professions (link)

I am assuming that all of the studies are reasonably homogeneous, and that combining by meta-analysis is appropriate.

(For the issues related to odds ratios, logistic regression, and heterogeneity, I strongly recommend checking out the discussion in Dr. Harrell’s Biostatistics for Biomedical Research (aka BBR) specifically Ch 6.9.4 pages 188-199, and Ch. 13.2.2, pages 343-344.)

After studying this 2017 article by Dr. David Hoaglin and Bei-Huang Chang again, I would attempt to estimate the large surgery odds ratio and the Small Surgery odds ratio independently, and use those estimates when I had incomplete data.

The procedure roughly is:

1. For studies where there is complete data, calculate an odds ratio of Infected/Not Infected for large surgeries, small surgeries independently.

2. If a study is missing the an OR for the large surgery, substitute the average OR from the large surgeries in step 2. Likewise, substitute your average for small surgeries if you are missing an OR from a small surgery.

3. Combine all the studies . They advocate the technique of compare (calculate individual effect sizes at the study level) then combine (aggregate the results). I’d do that via the logit of the individual studies \sum{logit(S_1 ... S_N)},

The authors recommend using Mixed Effects Logistic Regression vs. the conventional DerSimonian-Laired approach. Their approach avoids the problems of under-estimates of between study variance.

I think it would be possible to use an Empirical Bayes technique to incorporate the incomplete studies. Whether that is more informative compared to analysis of only the complete studies, remains to be seen.

1 Like

I generally agree with @R_cubed re: the importance of looking at within trial relative effects vs adding counts. A couple of quick additional questions for clarification:

1. Is your intervention of interest here the surgery approach itself (I assume open vs lap or something similar)? Or are you wondering how a separate intervention (e.g. prophylactic antibiotic) might vary in effectiveness when patients require a major vs minor surgery?
2. Are your comparative studies randomized?
3. Are patients actually eligible for either surgery, or is there anything systematically different? If you’re comparing two approaches for the same indication then a few thoughts come to mind:
• Are surgeons choosing the big surgery for riskier cases (single arm studies or if they are not RCTs)?
• Does access to technology needed for small surgery coincide with measures of quality (e.g. academic institutions, centres of excellence, etc…?)
• Is there a learning period for whatever the newer technique is?

A general comment here is that you’re looking at probably a small effect (Study 1 and 2 disagree on magnitude and direction) and you have very small numbers. In addition to questions above I would be interested in whether these studies all measure the outcome in the same way (culture confirmed? Clinical signs? Are these equal severity?), whether pre/post-op care is similar across them, whether duration of follow-up is similar, and any other clinically relevant questions you can think of. If these aren’t RCTs then you will have some additional problems to consider.

I have two concerns that come to mind:

1. I’m not clear that the data is actually missing in the sense that patients were treated but infection just wasn’t reported. What would you recommend he input as a standard error?
2. He really only has two studies from which to estimate the random effect for the LOR, but this method will make it seem like he has 4 with two of them just being mean imputations which would bias heterogeneity down.

This paper might be interesting if you can access it. It is focused on network meta-analysis but that’s really just an extension of what you’re trying here.

2 Likes

Excellent question. I’m going to have to do some more reading on this, but I think the use of an EB prediction interval would be helpful. I have to study this paper more closely:

The problem in this particular case is the small amount of data. It might be easier to simply state a prior rather than trying to estimate one.

Thanks for all the responses. I must admit, much of it is over my head as I only have a background of rudimentary undergrad stats behind me.

The example I gave was fictitious data to simplify the question. I am working on a systematic review of risk factors for post-surgical infection. All the studies I have found are retrospective analyses.

This differs from a ‘standard’ systematic review, which usually looks at one intervention and a primary outcome. My review seeks to analyse a number of risk factors across many studies (e.g. Age, Sex, pre-operative chemo, pre-op nutrition, smoking, diabetes etc) and report their overall effect size on the rate of post-op infection.

After studying this 2017 article by Dr. David Hoaglin and Bei-Huang Chang again, I would attempt to estimate the large surgery odds ratio and the Small Surgery odds ratio independently, and use those estimates when I had incomplete data

‘Large’ and ‘Big’ are the 2 options in the categorical variable ‘size of surgery’, so I can only calculate an OR if both are present (if I’m understanding your comment correctly). Those studies that only report on either ‘Big’ or ‘Small’ only report a rate of infection for their entire group - there is nothing to compare to so no OR. Hence my attempt to calculate an ‘Overall’ OR by pooling the absolute values.

Simple Pooling versus Combining in Meta-Analysis
Dena M. Bravata, Ingram Olkin (2001) Evaluation in the Health Professions (link )

I’ll have to do a lot more reading into this by conceptually it doesn’t make sense to me yet. Isn’t the point of a meta-analysis to increase the power of a result by pooling multiple small studies? For example, if I combine the weighted OR’s of multiple under-powered studies, wouldn’t that just give me an average OR with a high p-value?

This is about as challenging as meta-analyses get from a conceptual/methodological standpoint so I think you have a lot of reading ahead of you, but I’m happy to help where I can. I guess one question would be whether this is for a course or a thesis project, or just something you’re working on out of interest?

Some more follow-up:

1. When you say you are analyzing risk factors, what would you describe your ultimate goal as being? Are you just interested in describing relationships you’ve observed, making causal claims, informing a separate analysis?
2. Are you mostly relying on between trial information, or do you have a lot of within trial information as well (e.g. the study reports outcomes in males vs females)?
3. Do you have a clinical background?

There are many ways to pool. Typically we want to preserve as much within trial information as possible and simply summing up events doesn’t accomplish that. The simple pooling paper that was linked has a great example of this (Table 2). It also does a great job of introducing the concept of ecological bias which will be important for you to consider when interpreting any results that rely on between trial information (e.g. average age varying across studies).

If you are able to access it, this prognostic review does a pretty good job talking about all of the extra challenges related with meta-analysis of risk factors.

Thank you so much for your input.

1. When you say you are analyzing risk factors, what would you describe your ultimate goal as being? Are you just interested in describing relationships you’ve observed, making causal claims, informing a separate analysis?

The background to this review is the risk factors that predict post-operative infection in pancreatic and liver surgery. I am trying to make a case that the risk factor profile in pancreas vs liver surgery is different. So, for example, perhaps Male sex has a greater effect on the rate of infection in pancreatic surgery than in liver surgery.

1. Are you mostly relying on between trial information, or do you have a lot of within trial information as well (e.g. the study reports outcomes in males vs females)?
I’m not sure what you mean by “between trial information”, but I am extracting the data from individual studies, and reporting on some of the specific measures each study uses. For example, different studies have different definitions for ‘advanced age’.
1. Do you have a clinical background?
Yes, I am surgeon with some research experience, but I’ve never done a systematic review of this nature before.