How to best handle limits of detection with continuous predictors?

Hello there :]

There exists a vast amount of literature on the problems of categorizing continuous predictors and countless warnings against dichotomization and categorization as neatly summarized in Frank’s thread in this very forum.

In clinical research, laboratory parameters are often measured only up to certain limits of detection, in some cases leading to a sizeable proportion of study participants falling below or above such limits. In some cases this problem, I guess, could be ignored; in my field for example, coagulation factor proteins come to mind, which are often only measured to the nearest integer (on a IU/dL scale) anyway and come with a lower limit of detection of <1 IU/dL; with further values below zero being impossible, setting them to zero even if a sizeable proportion were to fall into that category seems reasonable. However, for most other laboratory parameters the range beyond such limits of detection typically encompasses a good chunk of possible values. D-dimer (for one of myriad examples) at my laboratory is measured to the nearest two digits up to a lower limit detection of <.27 with many (or maybe most) falling into this range.

Now I know this always has to depend on the specific biomarker at hand, and one would ideally incorporate the subject matter knowledge in each case; Ignoring that for the moment (or assuming a case where we just do not know much that might help us here), what would be considered a general good way to tackle this common issue with many laboratory measurements when considering them as predictors in some model? Very often the answer in my field of clinical research unfortunately is to just throw away the continuous information and define some threshold, use a median split, or engage in some other form of categorization. One thing that comes to my mind, would be to just set those beyond the limit of detection to the value at the limit and incorporate an additional indicator dummy variable for this detection limit into the model? Would this be a reasonable approach? What would be a general acceptable way to include such predictors beyond categorization?

Thank you very much in advance for any help :slight_smile:

1 Like

There is a large literature on this now; I just don’t have the references. I favor your last approach, creating a discontinuity by adding an extra indicator variable. This is especially easy to do with the R rms package. The main problem with this is that in the continuous Y setting where you usually assume constant residual \sigma^2, \sigma^2 needs to be larger for the observations for which X was truncted because you are then using incomplete conditioning.

I see there is a recent paper on ArXiv with @f2harrell as a co-author which deals with this problem when it is the outcome variable that has a detection limit(s): [2207.02815] Addressing Detection Limits with Semiparametric Cumulative Probability Models

I’m interested if @f2harrell has any thoughts on whether that approach for detection limits for an outcome variable may be modified to apply for predictor variables?

The way detection limits are handled for independent vs. dependent variables is vastly different. It’s easier for dependent variables because of all the censored data methods and ordinal models we have.

Thanks Frank. I will stick to an approach which adds an indicator (or two) variable.

Just remember the added complication when Y is continuous, of really needing to increase residual variance for incomplete observations.

Could something like the monotic/ordinal constraints in brms help here? Estimating Monotonic Effects with brms

I’m not sure how those constraints would apply differently in this setting than in complete measurement settings.

I guess I was just thinking in terms of if there was a reason to believe relationship is monotonic then it would allow for arbitrary decrease at the measurement bound but I guess the price you pay is in lack of flexibility in the rest of the relationship

1 Like