Hi Everyone,
First time poster and look forward to engaging in the community here. This post is an extension of a Twitter discussion related to individualizing risk estimation. As a way of introduction, I am a final-year trainee in emergency medicine and recently completed a MSc in evidence-based health care, performing a systematic review/meta-analysis of the HEART score for my dissertation. Through this work, I have gained an interest in prognostic research. I have come to really like the idea of prognosis (i.e. what is likely to happen) being a guide for clinical practice for many scenarios (more reading: https://www.ncbi.nlm.nih.gov/pubmed/25637245), with a particular emphasis on patient-centred and system-centred outcomes. It is my hope that with a shift from diagnostic to prognostic thinking for certain clinical scenarios (i.e. where the diagnostic reference standard is vague), we can begin to address issues related to over-testing (i.e. exposing a patient to the harms of a test when it is unlikely the result of the test will benefit the patient) and over-diagnosis (i.e. labeling a patient with a “disease” when this label does not benefit the patient or perhaps even could cause harm) in clinical medicine.
One of the things I struggled with throughout my dissertation was how the clinician can practically individualize a risk estimate to the patient in front of him or her to guide a clinical decision. There are many issues related to critical appraisal (i.e. external validity and internal validity, particularly spectrum, incorporation and verification biases) and the specifics of a clinical scenario that can make having confidence in a precise, individualized risk estimate challenging. As a result, in practice, I will typically present prognostic evidence to the patient as “population-level” evidence (i.e. “The best available evidence we have suggests patients just like you or similar to you have an approximately X risk of outcome of interest”).
I will use the HEART score as a potential spring board for discussion. The HEART score is interesting in many ways. Firstly, it was “derived” by a group of physicians in the Netherlands who believed the chosen prognostic factors and their chosen weights, based on clinical experience, would predict the risk for “major adverse cardiac events” or MACE, defined as death, myocardial infarction or need for coronary revascularization by cardiac catheterization or open-heart surgery, within 6 weeks. These authors did not “derive” the prognostic factors in a way I think many in this community would find to be robust, but it turns out that it has nonetheless caught on with researchers, with over 30 external validation studies as well as a regression analysis suggesting the chosen factors and their weights were appropriate.
The HEART score was initially conceptualized as a “diagnostic tool”. The intent was that it might help the clinician more confidently rule out acute coronary syndrome in the patient presenting to the ED with chest pain. Acute coronary syndrome is an interesting diagnosis in medicine. It lacks a clear diagnostic reference standard, with the most common reference standards being future MACE (i.e. 30 days to 6 weeks) or cardiologist adjudicated chart review, both of which are problematic and introduce importance biases, namely spectrum, incorporation and verification. I think as a result of this, some authors of those 30 external validation studies have looked at the HEART score more from a prognostic lens i.e. once the clinician has “ruled out” acute coronary syndrome from his or her assessment, how might I then estimate the risk of a bad outcome for the patient and then use this risk estimate to guide certain clinical decisions (i.e. does this patient need a stress test? does this patient need a cardiology follow-up? should I start this patient on aspirin pending further testing?).
I became somewhat fascinated with the HEART score due to how much it has been studied and talked about in the blogosphere (i.e. “FOAMed”). Despite the shortcomings (i.e. not “properly” derived, foggy diagnostic reference standard with important biases to consider), clinicians and researchers alike have embraced it. I am unaware of any other diagnostic/prognostic model that has been studied to this extent in emergency medicine. I think clinicians have embraced the HEART score because it is accessible and reflects the information we typically gather and analyze in our decision making in the absence of a formal risk model.
A patient can score anywhere from 0 through 10. For reasons that are not entirely clear, over the years, patients are typically grouped into the following strata: HEART score 0-3 (low risk, approx 2% MACE at 6 weeks), 4-6 (intermediate risk, approx 15% MACE at 6 weeks) and 7-10 (high risk, approx 50% MACE at 6 weeks). The idea was that patients in the low risk group would be appropriate for discharge from the ED, potentially without any additional follow-up aside from primary care provider, whereas the others may warrant an admission and, if not, urgent stress test and cardiology follow-up.
A few issues with the HEART score that I hope generate some discussion:
1 - Some HEART score studies organize their data in 11 strata (i.e. HEART score 0, HEART score 1, HEART score 2 … all the way to HEART score 10), whereas the vast majority organize their data in the original 3 low, intermediate and high risk strata described above. What are the pros and cons of each approach in guiding decision-making at both the individual patient assessment and system levels (i.e. locally, we are going to implement a policy where all patients with HEART score 4 or greater will be prioritized for stress test and cardiology follow-up within 72 hours)? It seems to me it is very unlikely that a patient with a HEART score of 0 or a HEART score of 3 have the same risk, but unfortunately we do not have the data stratified to know the risk difference between 0 and 3. But, in the grand scheme of things, does it matter?
2 - Unfortunately, HEART score studies do not provide information on time to outcome data. As an emergency clinician, I am more concerned about a patient I send home to have a myocardial infarction that evening, rather than one who has a cardiac cathetherization at 6 weeks, which was likely the result of an abnormal stress test that I arranged for the patient in the first place. As a clinician, how does one attempt to cope with a lack of time-to-outcome data? Knowing when the events occur is important, no?
3 - How does one summarize the prognostic data, both in research and when talking to the patient? Interestingly, a recent HEART score systematic review/meta-analysis opted to summarize “prognostic accuracy” in terms of sensitivity and specificity: https://www.ncbi.nlm.nih.gov/pubmed/30375097 What is the advantage of thinking about prognosis in terms of sensitivity and specificity? What is wrong with absolute and relative risks? I think this review reflects some of the confusion among the 30+ external validation studies, where there is variability in how data is summarized.
It seems, in emergency medicine literature, there is a tendency to use sensitivity and specificity for prognostic clinical questions. I think it comes from the notion that, at least mathematically, sensitivity and specificity should not change with the prevalence of the outcome, and can be used to generate likelihood ratios, which are then applied to a patient’s pre-test probability to generate a post-test probability. But, isn’t the HEART score attempting to estimate a patient’s pre-test probability prior to stress test or coronary angiogram? Is it right to conceptualize the HEART score as a “test” that has false negatives and false positives? How do you explain the concept of a “false negative” HEART score to a patient?
4 - In prognostic accuracy systematic reviews/meta-analyses, it is my view that authors should be challenged to perform subgroup analyses to confirm the prognostic tool performs similarly in diverse populations with diverse baseline risks of major adverse cardiac events. In the absence of this analysis, how does one even begin to attempt to individualize the risk assessment of the patient in front of him or her? This is one way a review can attempt to address the external validity question. But practically speaking, this also raises the question of how a physician attempts to cope without knowing what his or her local event rate is. This is another challenge in individualizing risk estimation.
5 - How does one cope with biases in prognostic research? Can one “adjust” for these biases? For example, a patient with a higher HEART score is more likely to have multiple cardiac troponin levels and/or a stress test performed during a hospital observation period, whereas a patient with a lower HEART score is more likely to have just one troponin be performed without any stress testing. As a result, the former patient is at increased risk of a MACE being detected solely due to more testing occurring. And the latter patient is at increased risk of a MACE being missed due to less testing occurring. Does this matter when a clinician attempts to precisely estimate the risk of a patient sitting in front of him or her?
A lengthy first post, but look forward to any discussion that ensues. Though I have used the HEART score as a way to illustrate certain concepts, I believe these concepts apply to diagnostic/prognostic models in general. As a result of these issues, this is why I always present risk estimates to a patient as “population level” evidence, rightly or wrongly.