Sample size considerations for interaction covariate adjustment


I am working with a clinician who is interested in running a relatively simple RCT, comparing therapeutic efficacy of a treatments pain reduction compared to a placebo during a painful gynecologic procedure (endometrial ablation). The main outcome will likely be an ordinal VAS scale (0-10). There will be 1:1 treatment allocation of either the treatment or placebo. Based on a preliminary sample size calculation, approximated by a t-test power calculation with a specified minimal important clinical difference, we’ll need about 100 participants in total. I asked my clinical collaborator for possible variables that may explain variation in our pain outcome. He believes increased parity will increase pain tolerance and having had a caesarian section will decrease it. To me, it seems these two variables will interact and this seems important enough to adjust for. I’m am not sure how large the effect of the interaction would be, but ultimately we are not so much interested in estimating the interaction effect and just want to include it as a model covariate. To properly estimate the interaction, we likely need much more data (as much as 16 times more) as shown by Andrew Gelman (1). To simplify, let’s say 16 times our original sample size is needed to estimate the interaction, which is 1600 observations, I cannot see many people being cool with that large sample size jump. Would keeping the original sample size of ~100 and adjusting for the interaction anyway be a good enough approach to estimating treatment efficacy?

TL;DR version, is there an issue in adjusting for covariate interaction when you just won’t have near enough data in a RCT?


Edit: This seems like a job for penalization



The interaction gelman refers to would be eg treatment x parity (ie a subgroup analysis), not caesarian x parity. I don’t think you could stratify randomisation for those factors either (impractical/impossible), but your n is not too small, thus ok i guess. A power calculation can be rough, hope that it’s conservative (n is an over-estimate) which i guess it is if based on t-test ie no adjustment



I’m not sure you’d have to adjust for the interaction term, but maybe I’m missing something yet…if your primary goal is to estimate the treatment effect, as long as you’ve stratified randomization according to parity and history of C-section and then you adjust for them in the primary analysis, that seems like it would do the job the RCT is setting out to do.

Why do you feel the need to also adjust for the interaction term if you know that you’re not going to be able to estimate it?

1 Like


My primary goal is to estimate the treatment effect. Perhaps this is just my misunderstanding of ANCOVA, I thought by adjusting for the interaction term (parity*C-section) the estimate of the treatment effect would be more powerful and I was not clear if there would be an issue doing this when there isn’t much data to include the interaction.



Much has been written about the benefits of covariate adjustment in RCT’s (though uptake remains lower than we might like). In general, adjusting for a small number of prespecified covariates known to have strong associations with outcome is desirable as it will increase power; my slight stumbling block here is why you’d also need to adjust for the interaction term (even if you think interaction is present); and furthermore, even if you do ‘adjust’ for the interaction term, I don’t see why you’d have to design a study large enough to estimate the interaction term when your primary goal is accurately estimating the treatment effect.

If I’ve spoken in error, happy to be corrected.



I wish to suggest that your actual aim should be to inform clinical decision-making. Could you possibly provide more clinical context for this trial? What is the situation, exactly?

  1. Is this this clinician generally dissatisfied with the level of pain control achieved for this procedure? Or is there some subset of patients for whom the pain control proves insufficient? Is this group to any extent predicable?
  2. Is it pain during the procedure that is at issue, or pain experienced after the procedure?
  3. What is the nature of the new intervention? Is it a drug administered before or during the procedure? A non-drug intervention like acupuncture or such? A modification to the procedure itself? A drug to be used on an as-needed basis after the procedure?
  4. What’s the benefit-risk calculus here? Can you sketch the clinical decision-theoretic setting for us? Why might one not want to use this new intervention? Does it have potential adverse effects? Other types of burdens or costs?

So far in the discussion, I don’t see any clear sign that anything beyond an opportunistic academic exercise is being planned here. That is, I don’t see anything like a sufficient clinical rationale being articulated. Without meaning to offend, I’d like to register the concern that I cannot tell whether this might be just a matter of having a convenient set of subjects (a steady flow of patients through the clinic) who create an opportunity to (a) run an RCT, (b) “estimate a treatment effect,” and © get a paper published.



If you really are concerned about the interaction being a potential confounder, or if you think it will “account for” enough variance in the outcome to be worth modeling, I think you could include it without worrying about power. Power is about hypothesis testing, and you’re not interested in testing a hypothesis about the interaction. It is also about precision, and a low-power estimate will be a poor estimate of the population interaction effect, but you don’t care about that either. It will, however, be an exact estimate of the interaction effect in your sample. Including it in your model removes the interaction effect from the effects you are interested in. The danger is that by adjusting for it, if it is not an important predictor or if it doesn’t vary between treatment groups, you risk introducing more bias than if you leave it out, because you are potentiallly unbalancing the randomization by adjusting for something that may be correlated with other variables that may be important that are, assymptotically at least, balanced by the randomization. My guess is that it’s not likely to make much difference, but if I were concerned about it, I would specify the simpler model as my primary analysis, and run the model with interaction as part of a secondary sensitivity analysis, just to make sure it doesn’t change the results substantively. If it does, then you’ll need to be much more cautious in your conclusions, and try to figure out what’s going on, possibly addressing it in a larger future study. If you’re concerned about it but can’t trust your interaction estimate, you could also estimate, maybe through simulation, how big such an interaction would have to be to change your results – if it’s implausibly large, don’t worry about it.



Would it have helped if he’d said “my primary statistical goal is to estimate the treatment effect”? All of your questions are important research questions, but I don’t think the original poster needs to attach a grant proposal to justify asking a perfectly reasonable stats question. It is illogical, not to mention rude, to insinuate that he/she is engaged in an “opportunistic academic exercise” because he/she hasn’t justified the study to your satisfaction when asking the question. It wouldn’t matter if the whole study was a made-up text book example. I apologize if I’m missing your point, but I don’t see how the answer to any one of your questions would affect how one would answer the question about including an underpowered interaction term in a statistical model. Remember, this is called the “Datamethods Discussion Forum”.



Thank you for your thoughtful and helpful reply! I think my initial question was clumsy as I wanted an analysis that would have added precision by accounting for outcome variance by adjusting for some covariates and thought adjusting for their interaction would account for more variance and awkwardly brought in what may be a too many variables and not enough data issue and maybe a query for other potential issues.



I attempted to keep my example simple as it is there to aid in my statistical question. Ultimately, yes the goal is to inform clinical decision-making, but for my question I am trying to get the best treatment effect estimate.

1 Like


I do not think it possible to abstract away ‘purely’ statistical questions from clinical research. For example, if the intervention were to be used as-needed after the procedure (see point #3 in my message above), then this would create opportunities for a crossover design—thereby questioning a basic premise of the question. I should certainly hope that questioning premises is in-scope here!

If one wishes to ask a purely statistical question, then by all means one can—and certainly we’re all trained to use pure symbolism to this end. But I assume that clinical details were brought in because they were thought substantively important, and appropriate to address in answers.

Finally, if you think my comment was rude, just wait until Dr. Jen Gunter happens to stumble upon a bunch of men (we all have male given names, so far in this convo) blithely discussing the study of a painful gynecologic procedure as a pure statistical abstraction!



Is “the best treatment effect estimate” invariant to your model of the DGP? If not, what models of the DGP are you considering? Do any candidate models include latent, patient-specific factors such as subjective pain thresholds? Have you considered that the indication for caesarean section (CS) might be informative about such latent variables, apart from whatever anatomical factors (uterine scar) the history of CS represents?