How to best handle limits of detection with continuous predictors?

Hello there :]

There exists a vast amount of literature on the problems of categorizing continuous predictors and countless warnings against dichotomization and categorization as neatly summarized in Frank’s thread in this very forum.

In clinical research, laboratory parameters are often measured only up to certain limits of detection, in some cases leading to a sizeable proportion of study participants falling below or above such limits. In some cases this problem, I guess, could be ignored; in my field for example, coagulation factor proteins come to mind, which are often only measured to the nearest integer (on a IU/dL scale) anyway and come with a lower limit of detection of <1 IU/dL; with further values below zero being impossible, setting them to zero even if a sizeable proportion were to fall into that category seems reasonable. However, for most other laboratory parameters the range beyond such limits of detection typically encompasses a good chunk of possible values. D-dimer (for one of myriad examples) at my laboratory is measured to the nearest two digits up to a lower limit detection of <.27 with many (or maybe most) falling into this range.

Now I know this always has to depend on the specific biomarker at hand, and one would ideally incorporate the subject matter knowledge in each case; Ignoring that for the moment (or assuming a case where we just do not know much that might help us here), what would be considered a general good way to tackle this common issue with many laboratory measurements when considering them as predictors in some model? Very often the answer in my field of clinical research unfortunately is to just throw away the continuous information and define some threshold, use a median split, or engage in some other form of categorization. One thing that comes to my mind, would be to just set those beyond the limit of detection to the value at the limit and incorporate an additional indicator dummy variable for this detection limit into the model? Would this be a reasonable approach? What would be a general acceptable way to include such predictors beyond categorization?

Thank you very much in advance for any help :slight_smile:

1 Like

There is a large literature on this now; I just don’t have the references. I favor your last approach, creating a discontinuity by adding an extra indicator variable. This is especially easy to do with the R rms package. The main problem with this is that in the continuous Y setting where you usually assume constant residual \sigma^2, \sigma^2 needs to be larger for the observations for which X was truncted because you are then using incomplete conditioning.