I am working with a clinician who is interested in running a relatively simple RCT, comparing therapeutic efficacy of a treatments pain reduction compared to a placebo during a painful gynecologic procedure (endometrial ablation). The main outcome will likely be an ordinal VAS scale (0-10). There will be 1:1 treatment allocation of either the treatment or placebo. Based on a preliminary sample size calculation, approximated by a t-test power calculation with a specified minimal important clinical difference, we’ll need about 100 participants in total. I asked my clinical collaborator for possible variables that may explain variation in our pain outcome. He believes increased parity will increase pain tolerance and having had a caesarian section will decrease it. To me, it seems these two variables will interact and this seems important enough to adjust for. I’m am not sure how large the effect of the interaction would be, but ultimately we are not so much interested in estimating the interaction effect and just want to include it as a model covariate. To properly estimate the interaction, we likely need much more data (as much as 16 times more) as shown by Andrew Gelman (1). To simplify, let’s say 16 times our original sample size is needed to estimate the interaction, which is 1600 observations, I cannot see many people being cool with that large sample size jump. Would keeping the original sample size of ~100 and adjusting for the interaction anyway be a good enough approach to estimating treatment efficacy?
TL;DR version, is there an issue in adjusting for covariate interaction when you just won’t have near enough data in a RCT?
Edit: This seems like a job for penalization