I am working with a clinician who is interested in running a relatively simple RCT, comparing therapeutic efficacy of a treatments pain reduction compared to a placebo during a painful gynecologic procedure (endometrial ablation). The main outcome will likely be an ordinal VAS scale (0-10). There will be 1:1 treatment allocation of either the treatment or placebo. Based on a preliminary sample size calculation, approximated by a t-test power calculation with a specified minimal important clinical difference, weâll need about 100 participants in total. I asked my clinical collaborator for possible variables that may explain variation in our pain outcome. He believes increased parity will increase pain tolerance and having had a caesarian section will decrease it. To me, it seems these two variables will interact and this seems important enough to adjust for. Iâm am not sure how large the effect of the interaction would be, but ultimately we are not so much interested in estimating the interaction effect and just want to include it as a model covariate. To properly estimate the interaction, we likely need much more data (as much as 16 times more) as shown by Andrew Gelman (1). To simplify, letâs say 16 times our original sample size is needed to estimate the interaction, which is 1600 observations, I cannot see many people being cool with that large sample size jump. Would keeping the original sample size of ~100 and adjusting for the interaction anyway be a good enough approach to estimating treatment efficacy?

TL;DR version, is there an issue in adjusting for covariate interaction when you just wonât have near enough data in a RCT?

The interaction gelman refers to would be eg treatment x parity (ie a subgroup analysis), not caesarian x parity. I donât think you could stratify randomisation for those factors either (impractical/impossible), but your n is not too small, thus ok i guess. A power calculation can be rough, hope that itâs conservative (n is an over-estimate) which i guess it is if based on t-test ie no adjustment

Iâm not sure youâd have to adjust for the interaction term, but maybe Iâm missing something yetâŠif your primary goal is to estimate the treatment effect, as long as youâve stratified randomization according to parity and history of C-section and then you adjust for them in the primary analysis, that seems like it would do the job the RCT is setting out to do.

Why do you feel the need to also adjust for the interaction term if you know that youâre not going to be able to estimate it?

My primary goal is to estimate the treatment effect. Perhaps this is just my misunderstanding of ANCOVA, I thought by adjusting for the interaction term (parity*C-section) the estimate of the treatment effect would be more powerful and I was not clear if there would be an issue doing this when there isnât much data to include the interaction.

Much has been written about the benefits of covariate adjustment in RCTâs (though uptake remains lower than we might like). In general, adjusting for a small number of prespecified covariates known to have strong associations with outcome is desirable as it will increase power; my slight stumbling block here is why youâd also need to adjust for the interaction term (even if you think interaction is present); and furthermore, even if you do âadjustâ for the interaction term, I donât see why youâd have to design a study large enough to estimate the interaction term when your primary goal is accurately estimating the treatment effect.

I wish to suggest that your actual aim should be to inform clinical decision-making. Could you possibly provide more clinical context for this trial? What is the situation, exactly?

Is this this clinician generally dissatisfied with the level of pain control achieved for this procedure? Or is there some subset of patients for whom the pain control proves insufficient? Is this group to any extent predicable?

Is it pain during the procedure that is at issue, or pain experienced after the procedure?

What is the nature of the new intervention? Is it a drug administered before or during the procedure? A non-drug intervention like acupuncture or such? A modification to the procedure itself? A drug to be used on an as-needed basis after the procedure?

Whatâs the benefit-risk calculus here? Can you sketch the clinical decision-theoretic setting for us? Why might one not want to use this new intervention? Does it have potential adverse effects? Other types of burdens or costs?

If you really are concerned about the interaction being a potential confounder, or if you think it will âaccount forâ enough variance in the outcome to be worth modeling, I think you could include it without worrying about power. Power is about hypothesis testing, and youâre not interested in testing a hypothesis about the interaction. It is also about precision, and a low-power estimate will be a poor estimate of the population interaction effect, but you donât care about that either. It will, however, be an exact estimate of the interaction effect in your sample. Including it in your model removes the interaction effect from the effects you are interested in. The danger is that by adjusting for it, if it is not an important predictor or if it doesnât vary between treatment groups, you risk introducing more bias than if you leave it out, because you are potentiallly unbalancing the randomization by adjusting for something that may be correlated with other variables that may be important that are, assymptotically at least, balanced by the randomization. My guess is that itâs not likely to make much difference, but if I were concerned about it, I would specify the simpler model as my primary analysis, and run the model with interaction as part of a secondary sensitivity analysis, just to make sure it doesnât change the results substantively. If it does, then youâll need to be much more cautious in your conclusions, and try to figure out whatâs going on, possibly addressing it in a larger future study. If youâre concerned about it but canât trust your interaction estimate, you could also estimate, maybe through simulation, how big such an interaction would have to be to change your results â if itâs implausibly large, donât worry about it.

Would it have helped if heâd said âmy primary statistical goal is to estimate the treatment effectâ? All of your questions are important research questions, but I donât think the original poster needs to attach a grant proposal to justify asking a perfectly reasonable stats question. It is illogical, not to mention rude, to insinuate that he/she is engaged in an âopportunistic academic exerciseâ because he/she hasnât justified the study to your satisfaction when asking the question. It wouldnât matter if the whole study was a made-up text book example. I apologize if Iâm missing your point, but I donât see how the answer to any one of your questions would affect how one would answer the question about including an underpowered interaction term in a statistical model. Remember, this is called the âDatamethods Discussion Forumâ.

Thank you for your thoughtful and helpful reply! I think my initial question was clumsy as I wanted an analysis that would have added precision by accounting for outcome variance by adjusting for some covariates and thought adjusting for their interaction would account for more variance and awkwardly brought in what may be a too many variables and not enough data issue and maybe a query for other potential issues.

I attempted to keep my example simple as it is there to aid in my statistical question. Ultimately, yes the goal is to inform clinical decision-making, but for my question I am trying to get the best treatment effect estimate.

I do not think it possible to abstract away âpurelyâ statistical questions from clinical research. For example, if the intervention were to be used as-needed after the procedure (see point #3 in my message above), then this would create opportunities for a crossover designâthereby questioning a basic premise of the question. I should certainly hope that questioning premises is in-scope here!

If one wishes to ask a purely statistical question, then by all means one canâand certainly weâre all trained to use pure symbolism to this end. But I assume that clinical details were brought in because they were thought substantively important, and appropriate to address in answers.

Finally, if you think my comment was rude, just wait until Dr. Jen Gunter happens to stumble upon a bunch of men (we all have male given names, so far in this convo) blithely discussing the study of a painful gynecologic procedure as a pure statistical abstraction!

Is âthe best treatment effect estimateâ invariant to your model of the DGP? If not, what models of the DGP are you considering? Do any candidate models include latent, patient-specific factors such as subjective pain thresholds? Have you considered that the indication for caesarean section (CS) might be informative about such latent variables, apart from whatever anatomical factors (uterine scar) the history of CS represents?