In an AHIMA “Perspectives in Health Information Management” article, Verchinina et al (2018) say this:
A phenotype is something that can be observed. The suffix -type suggests a typology, or classification scheme. Hence, a phenotype is a way of classifying organisms, including people, based on characteristics they have that are observable and measurable.
Building on the definition of phenotype above, the term computable phenotype refers to a shareable and reproducible algorithm precisely defining a condition, disease, complex patient characteristic, or clinical event using only data processed by a computer, principally EHR data.2 As a shareable and reproducible algorithm, a CP is a tool that can be used to identify patient cohorts with a specific medical condition of interest associated with a specific set of observable and measurable traits. Further, because they are shareable and provide formal specifications of diseases and events, CPs can serve as standards, allowing patient cohorts to be compared and combined more easily than they are today.
Geisinger says:
Defining traits of patients (disease, treatment, response, etc.) in an iterative and collaborative process between programmers, investigators, and/or clinicians.
And further adds that the computer phenotype should be validated by chart review.
So a phenotype is a definition of some cohort of patients of interest to someone (a clinician, a researcher/P.I., a clinical process improvement team, a pop health group, your boss, you) and the computable phenototype is basically the heuristic rules, or recipe, for identifying those patients of interest, in the murky swamp that is your data environment. (Where “patient” is a human with a longitudinal history, rather than a single encounter.)
The iterative part of the workup is because usually some trial and error is needed to find the Goldilocks happy spot of Type 1 & Type II errors … i.e., not missing (many) patients the instigators think should be counted, and not counting (many) patients that the instigators think are not truly of interest … given that some of the “evidence for membership” (my term) may be fuzzy, missing, ambiguous, and/or conflicting.
With medical stuff, no matter the source of your data, you are at best only going to see some slice of the patient’s true condition. If you are working with provider-side data, you probably see only the encounter records for visits to your system’s affiliated physicians and facilities. (For example, I’m in Florida … we have a significant number of patients who are snowbirds from northern states; half the year they get their medical care at other providers, a 1000 miles or more away.) And you may or may not be harvesting 100% of the information stored in the EHR … can you guarantee your EHR text-parsing is correctly recognizing and categorizing every word, phrase, abbreviation, and “rule out”? On the insurance side, you only see what comes through on claims. If no claim is submitted … e.g., it was cheaper for the patient to pay cash out of pocket than their copay would have been … the encounter is invisible to you.
I think the key insights into understanding the rules are (1) that the rules have to work with the data you have, which is NOT the complete set of all the information you’d logically expect or like to have, and (2) that the variable or missing part can vary from patient to patient.
So … onto the DM rules you’ve quoted. I’m not a diabetes expert by any means, but we can reverse-infer some of the logic.
Rule 1 says that a DM med confirmed by a abnormal lab result counts, even in the (apparent) absence of a diagnosis. Why didn’t we find the Dx for DM in the EHR? I can’t say offhand, maybe you don’t have records from a bunch of the primary care physicians, in the time period you’re reviewing. It’s also possible this is a failsafe rule that only happens in 0.1% of cases we should find.
Rule 2 is another double-conditions rule, this time that a Dx + abnormal lab result should be counted as qualifying. This could be patients who have not been previously identified as having diabetes, patients who have been identified as having DM but aren’t “under management” for some reason, or patients who are under DM management by a PCP or other providers whose routine records aren’t in your data universe. Or other data holes.
Rule 3 says that, for some reason that is not made explicit in the rule itself, violations of the expected chronology of DM1–> DM2 don’t matter, in the presence of a history of meds for both. There may be some other rules, not shown, that toss out some candidate patients with slightly different factors. Or perhaps some drugs are used for both DM1 and DM2 and thus don’t clearly differentiate the DM type.
[Edit: I’m not clinical. My thinking when I wrote the above is that DM Type 1 is relatively uncommon compared to DM2, but that most patients with DM1 progressed to DM2 in their later years. That may not be true.]
Rule 4 is a nuanced version of the other rules. Perhaps the intent of this rule is only clear if all the “Not a Case” rules are also presented, in hierarchical order of execution.
Note that it doesn’t appear from what we see in these rules that the phenotype definition is trying to separate DM1 from DM2. Perhaps at some point that was attempted, but was deemed not successful enough to be usable. If that was the history, the rule stack could have some “relict rules” that were re-purposed late in the development process of the rules for this phenotype.
Does that help? Sorry about the length.
If you are adopting phenotype rules invented elsewhere, you’ll probably want to root out provenance documents explaining exactly what those rules are doing, expressed in clinician-to-clinician language. Otherwise you’ll have to work with your clinical audience/users to validate the rules against your own data. (And even with doco, you may still have to do that.)
d.d