Hello! I work on a linked health administrative dataset of all known people living with HIV in a given geographical area; an HIV clinical database has been linked to several administrative data sources. As is typically done in these studies, outcome variables (e.g., particular conditions/diseases) are defined by some (potentially validated) set of ICD-9/ICD-10 codes. The study that I work on also includes a random sample of the general (HIV-negative) population in the same area. This random sample is often used to do “HIV+ vs. HIV-” comparisons, usually by calculating incidence rate ratios (incidence rate [HIV+] / incidence rate [HIV-]).
Side note: Even after age- and sex-adjustment of the IRRs, I would not conclude that a higher incidence of condition X (e.g., emphysema) is being driven by HIV specifically, as any comparison by HIV serostatus (HIV+/HIV-) would undoubtedly be biased by additional confounding factors (unless some analytical wizardry was performed, and certain assumptions were clearly stated). Anyway…
Confounding aside (and also ignoring the myriad of other issues that come with using admin data for research purposes), I am finding myself concerned about misclassification in this setting, specifically, differential misclassification of a hypothetical outcome (as defined by a set of ICD-9/ICD-10 codes) by “exposure” status (HIV+/HIV-).
Adapting Rothman’s words (from here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/EP/EP713_Bias/EP713_Bias6.html) to an HIV-related example:
“Suppose a follow-up study were undertaken to compare incidence rates of emphysema (as defined by ICD codes) among HIV+ and HIV- persons. Emphysema is a disease that may go undiagnosed without unusual medical attention. If HIV+ persons, because of concern about health effects of HIV, seek medical attention to a greater degree than HIV- persons, then emphysema might be diagnosed more frequently among HIV+ than among HIV- simply as a consequence of the greater medical attention. Unless steps were taken to ensure comparable follow-up, an information bias would result. An ‘excess’ of emphysema incidence would be found among HIV+ compared with HIV- that is unrelated to any biologic effect of HIV. This is an example of differential misclassification, since the underdiagnosis of emphysema, a misclassification error, occurs more frequently for HIV- than for HIV+.”
Essentially, one can only be “measured” in administrative data if they present themselves to care. Given HIV- people are usually less engaged in the healthcare system (than people living with HIV), you are almost always bound to find a higher incidence rate of everything in the HIV+ group; I would wager that there are several scenarios when it is mostly/entirely due to misclassification and not due to HIV having an impact on the outcome.
First question: does differential misclassification of ICD code-based outcome measures, by HIV serostatus, seem like a potentially major issue in this setting?
Second question: I know that differential misclassification is less predictable than non-differential. Therefore, I am trying to track down some papers (i.e., admin data studies) that have attempted to mitigate this type of bias. Does anyone have any recommendations?
Thank you all so much!