Box-Cox transformation

I am reviewing a paper where the outcome variable (a biomarker) was box-cox transformed. Additionally they removed 30% of the outcome variable prior to this transformation due them being outliers. This seems to me to indicate a problem with measurement of the outcome variable. Should that many outliers be removed and is the Box-Cox transformation considered a valid approach to non-parametric data.

Please categorize this post and add applicable tags too.

Box-Cox is very origin-dependent (what do you subject before taking logs) and is not necessarily robust with regard to outliers. Because of all the uncertainties I prefer semiparametric models.

1 Like

What is the most appropriate method for dealing with a biomarker outcome variable that is right skewed? In this case to compare levels at various latitudes in the contiguous United States.

Unless there is a more-or-less predetermined optimal transformation to symmetry I’d use a semiparametric model that doesn’t assume a specific distribution. Or a transform-both-sides nonparametric additive regression model. Both are covered in