I am trying to figure out how best to compare gestational age as an outcome in clinical trials.

In general, gestational age is almost always left-skewed data presumably due to the fact that pregnancies won’t last far beyond 40 weeks (in part due to labour induction), but viable preterm birth is not an uncommon occurrence and can extend as low as 23 weeks. Please find here a histogram for a real gestational age dataset from a maternity hospital (n = 153):

For randomised controlled trials that aim to reduce preterm birth, I am trying to work out an optimal way to analyse this data. A brief review of previous trials show that this is generally done in three ways:

- Binomial comparison of <37 weeks and >37 weeks. Clinical definition of preterm birth is gestational age <37 weeks. This is quite a common method of analysis.
- Comparing means gestational age. Also quite common, but I have rarely seen transformation of the data.
- Survival analysis of time to birth. This is rather uncommon.

Option 1 is probably not optimal due to the various issues associated with dichotomisation, which are extensively covered on this form and in Frank Harrell’s BBR.

However, I’m trying to choose between Options 2 and 3. Both would yield interpretable estimates. Option 2 would likely require transformation to normal (e.g. power transformation, see histogram below), but I fear that the relatively hard upper bound may still be problematic. Option 3 might offer some more flexibility, but the relatively hard upper bound may not play nice with the proportional hazards assumption.

I would be very grateful for any advice on analysing this data in a clinical trial context. There are other models that I’m less familiar that may perhaps suit (e.g. Gompertz?), but I’m a bit concerned that these may be less interpretable for clinicians.