How to handle data requiring truncation to be valid

CP3 · January 23, 2025, 10:42pm

I am seeking advice on the appropriate statistical model for a situation involving right-truncated continuous data.

The outcome variable represents the duration (in weeks) that mothers fed their children formula after childbirth. The range includes responses like 0 weeks, 10 weeks, or 35 weeks.

However, there is a measurement error in this data: some mothers reported durations that exceed the child’s age at the time of the interview. For instance, a mother was interviewed when her child was 2 weeks old but reported feeding formula for 10 weeks, the response is biologically implausible. To address this issue, we decided to truncate these responses to the child’s age. In this example, the response would be adjusted to 2 weeks.

Given this truncation process, what statistical models would be most appropriate to analyze this data? While I understand that a simple linear regression could be applied if we exclude erroneous observations, I am particularly interested in methods that explicitly account for truncation if these adjusted responses are included. I am uncertain about the applicability of survival models in this scenario, as there is no specific event being analyzed—only the duration of formula feeding. Given this, I am unsure if a survival model would be an appropriate fit.

Thanks.

f2harrell · January 24, 2025, 1:01pm

I changed the title to be more descriptive and re-categorized the topic.

This makes you wonder about the accuracy of data that did not violate the age criterion. Are you sure this study’s data collection procedures are adequate for the questions to be asked?

If the only major inaccuracy is on the right side you can use right-censored data as with time-to-event models. See also R packages handling interval censoring.

CP3 · January 25, 2025, 1:08am

Thank you Frank. I am not confident about the quality of the data , I will highlight this as a significant concern.