Covariate imbalance in RCT

We ran a RCT in which the treatment groups were stratified by age, sex, BMI, and center.
It turned out, however, that the proportion of males was just 0.9 and 0.12 per treatment group.
The primary model should normally include all the variables used for stratification (following the Guideline on adjustment for baseline covariates in clinical trials (, but in this case, given the imbalance of males and females the adjustment by sex does not seem to make much sense.

Would it be correct/acceptable to exclude the sex variable from the model? The response is a continuous variable.

Many thanks in advance!

If you believe that biological sex influences the outcome (which is the reason why one would stratify for a covariate) then the observed imbalance makes adjustment even more valuable to reduce outcome heterogeneity and increase the precision of your estimate, along with all the other benefits that such an adjustment yields.

There needs to be a written pre-specified statistical analysis plan for the study. Covariate adjustment must be pre-specified. And it is always necessary to adjust for all stratification variables (or generalizations of them). It doesn’t pay to look for actual imbalances - see other topics related to that.

1 Like

Hi, just to add to the comments above, bear in mind that randomization, whether it is simple, stratified or covariate adaptive, even if done properly, by itself, does not guarantee absolute balance on all relevant baseline variables. It only ensures that any imbalance that may occur, does so by chance, and not due to patient selection bias. This is why performing formal inter-arm, null hypothesis test based comparisons of baseline characteristics (e.g. so-called table 1) is pointless.

That being said, if there is not a typo in your post and there was indeed 0.9 (90%) males in one arm, versus 0.12 (12%) males in the other, it would be worthwhile to review your randomization process to be sure that it was implemented properly. That is a significant imbalance, and if your study had a DSMB monitoring it, this should have been noted much earlier in the study timeline for review and any correction in the randomization process that may have been apropos at the time.

Depending upon how your randomization schedule was created, vis-a-vis block sizes and related considerations, partial block use can also contribute to an imbalance. This is why block size, and whether that is fixed (e.g. all 4), or permuted (e.g. 2 and 4 in random sequences), is an important consideration in implementing the randomization schedule.

There is a general phrase that is apropos which is, “Analyze as you randomize”, and which I believe is attributed to Fisher, though somebody will correct me. As Frank notes, this should all be pre-defined in an SAP for the study.

Thus, there is no motivation to leave gender out of your multivariable model, indeed, as Pavlos indicated, it increases the need to keep it in.


Thank you for making it clear!

Yes, the covariate adjustment was pre-specified in a statistical analysis plan. Does it also hold for secondary endpoints? Or it would be fine to report results from a reduced model (if it shows a better fit)?

My concern about sex variable is that it seems impossible to get insight into whether or how sex influences the response. The uncertainty coming from males is just too big.

Thank you again.

Hi I have a relatively naive question here. Suppose in a clinical trial the overall HR of treatment A compared to treatment B is 0.66 (95% CI: 0.61 to 0.73). Now we perform a subgroup analysis of the treatment effect for two subgroups - based on a prognostic variable (subgroup C and subgroup C’). Suppose, in group C the treatment effect in terms of HR is 0.72 (95% CI: 0.51 to 0.79). Is there a way that I can calculate and extract the HR for subgroup C’?

Thank you very much in advance.