Should I Exclude Patients Based on Treatment Criteria?

Louis_Martin · April 29, 2025, 5:33pm

Hi all,

I’m working on a retrospective analysis to determine whether a drug reduces the prevalence of brain bleeds among very premature babies in the NICU. Babies were supposed to receive treatment if they weighed less than 1000 g at birth, but this protocol was not always followed (and I’m not sure why). Based on previous literature, I would expect that I would need to adjust for the effect of gestational age at birth (i.e. level of prematurity) and relative size at birth. Although birth weight is often considered a good predictor of outcomes, it is often a proxy for gestational age – more premature babies weigh less—and what really matters is relative size (e.g. smaller than average for a given gestational age). I analyzed these data using an ordinal regression since brain bleeds are scored from 0 (no bleeding) to 4 (profound bleeding in both hemispheres). After creating a DAG, I decided to include treatment, gestational age, and Fenton Birth Weight Z Score (relative size for level of prematurity) as predictors: lrm(data = d, brain_bleed ~ treatment + gestational_age + birth_weight_z).

Below is a modified version of the data showing the relationship between gestational age, birth weight, and treatment. You can see that only two individuals born below 1000 g did not received treatment, but many above 1000 g did get treatment. Also, there are only a couple of babies born before 27 weeks who did not receive the treatment, so most of the variability in treatment is occurring in the top right corner. Despite these limitations, I through my approach was a decent way to determine the evidence for the effectiveness of treatment.

A statistician colleague said that I need to remove all babies born below 1000 grams and failing to do so would make this study difficult to publish. My recollection of their rationale is that because all babies born < 1000 g were supposed to get the treatment (and the vast majority did), it does not make sense to include those patients as they would skew the comparison. I think that excluding patients is a bad idea since it would severely reduce the sample size while removing the majority of patients with brain bleeds. Additionally, birth weight is not a factor that needed to be controlled for according to the DAG, so I’m not sure why it would need to be taken into account in the analysis.

Is removing patients < 1000 g as bad of an idea as I think it is? Does the model/approach I used seem reasonable?

f2harrell · April 29, 2025, 7:11pm

I think the underlying reason some feel that exclusion of < 1Kg is warranted is that they are not comfortable with statistical model assumptions. Personally I would take a little risk about model fit and use a restricted cubic spline for birth weight with say 4 knots (which gives it 3 d.f. in all; ignore the z-score which makes unwarranted assumptions). You might also include gestational age with 3 knots so that it gets only 2 d.f.

s_doi · April 29, 2025, 7:23pm

Agree with Frank … if the DAG assumptions are correct then BW is the variable of interest and the others can be ignored (in terms of bias but risk magnification is another consideration).

Louis_Martin · April 29, 2025, 7:36pm

Thank you for your reply, Frank. I did compare a model with and without restricted cubic splines for gestational age based on AIC (and the linear model was better). The effective sample size was about 100 (since most patients did not have a brain bleed), so I didn’t try additional comparisons.

I’m curious about your dislike of the Z-score. The Fenton Z-score is calculated from a meta-analysis of millions of births and indicates the size of a baby relative to other babies born that prematurely. My thought was that birth weight and brain bleeds may be statistically related, but this is likely due to birth weight being a proxy for prematurity (which is better captured by gestational age) and very low birth weights relative to gestational age causing poor outcomes. Studies often use “Small for gestational age” (bottom 10th percentile) as a predictor because these babies have bad outcomes, but this dichotomous variable has much less information than Fenton Z-scores.

My other concern is that gestational age and birth weight are highly correlated. If you add both to a model, one of them will show up as uninformative. Would that matter if I’m just interested in the effect of treatment?

f2harrell · April 29, 2025, 9:52pm

Z-scores usually assume linearity in the predictor and they always assume symmetry in the distribution of the raw variable. Otherwise SD is not an appropriate dispersion measure.

To me it seems rather odd to score one infant’s size relative to other babies. Physics and physiology operate on the basis of one person’s characteristics.

Right, the collinearity will not affect treatment estimates as long as treatment doesn’t interact with one of the variables in question.