Issues with incorporation bias in prognostic studies?

I was wondering if anyone had guidance about including certain candidate predictors in a prognostic model that are potentially part of the outcome definition.

For example, the diagnosis of Alzheimer’s disease is based in part on cognitive scores such as the MMSE. If we were developing a model to predict diagnosis of Alzheimer’s disease in the future, is there anything wrong with including baseline MMSE as part of the model? PROBAST has a section on how this might lead to optimistic estimates of model performance for diagnostic studies (see below). However, it feels that we would want to include baseline MMSE score because it is such a powerful predictor of going on to develop Alzheimer’s disease.

Another example might be if we wanted to develop a model to predict heart failure with reduced ejection fraction (HFrEF). HFrEF is commonly diagnosed by symptoms of heart failure + EF <40% on echo. Is there anything wrong with including baseline EF as part of the model? Someone with an EF of 40-50% at baseline is likely to be at higher risk for developing HFrEF compared to someone with a baseline EF that is normal. This isn’t always the case as there are many ways to get to an EF of 40-50% such that having a baseline EF of 40-50% doesn’t necessarily mean that you are at higher risk for having an EF <40% compared to someone with a normal EF (if for example the person with a normal EF has other factors that put them at extremely high risk). However, on the whole, having a baseline EF 40-50% indicates a higher risk state.

Are there any potential issues with including these as predictors? Should they be included in the model since they are powerful predictors and then acknowledge that this could lead to bias in model estimates?

Thanks for any guidance!

In PROBAST, they discuss in section 3.3 the case of including baseline troponin among the predictors for predicting non-fatal MI since troponin value is used to define an MI.

3.3 Were predictors excluded from the outcome definition?

  • Outcomes should ideally be determined without information about predictors (see signaling question 3.5), but in some cases it is not possible to avoid including predictors—for example, when outcomes require determination by a consensus panel using as much information as is available. If a predictor in the model forms part of the definition or assessment of the outcome that the model predicts, the association between predictor and outcome will likely be overestimated, and estimates of model performance will be optimistic; in diagnostic research, this problem is generally called incorporation bias (104, 111, 115, 117, 119, 131–134).

  • Example. Aslibekyan and colleagues (86) aimed to develop a cardiovascular risk score based on the ability of predictors (such as dietary components, physical activity, smoking status, alcohol consumption, socioeconomic status, and measures of overweight and obesity) to predict nonfatal MI. The study reported that MI was defined according to World Health Organization criteria, including cardiac biomarkers, electrocardiography, imaging, or autopsy confirmation. Because the lifestyle and socioeconomic predictors Aslibekyan and colleagues used for modeling do not form any part of this definition of MI, the study would be rated as Y for this signaling question. If the study had included a cardiac biomarker (such as troponin T at initial hospital presentation) among the predictors assessed, this signaling question would likely be rated as N. This is because the initial troponin T measurement may have formed part of the information used to determine the outcome (MI).

1 Like

The only incorporation bias I am familiar with is related to diagnostic test accuracy studies—as opposed to prognostic studies; see this review article for definitions.

In studies in which disease is adjudicated by experts (including chart review), incorporation bias might affect study results. This occurs when the index test results are included in the adjudication process. Incorporation bias falsely results in elevated sensitivity and specificity.

They then give an example of using high-sensitivity troponin T as the “index test” for acute myocardial infarction, and mention that the authors of that study were blinded to the test results; i.e., they didn’t provide the test results to the expert panel during adjudication.

I can give some text from the book I mention in this forum post (the book is largely concerned with diagnostic test accuracy studies, such as computer-aided diagnosis [CAD]):

For studies measuring the effect of CAD, there are several ways to minimize the effects of incorporation bias: 1) include multiple expert readers in the expert panel and use the majority opinion as the gold standard diagnosis, 2) provide the expert readers only the images without the CAD marks for determining the gold standard diagnosis, and 3) for very large studies and/or when image interpretation is very time consuming, show the expert readers the compilation of findings found by the study readers and/or by CAD - ask them to make a determination about the presence or absence of a suspicious lesion, and do not tell the experts which lesions were identified by CAD and which were identified without CAD. A third option for avoiding imperfect gold standard bias is to use a mathematical correction.

I realize you mention diagnostic studies, and I’m not sure if this kind of bias “operates” the same way in prognostic/outcome studies. But perhaps someone else can reply with suggestions.

P.S. Your quote seems to come from this article. My first suggestion would be to follow their citations, although at a glance they don’t appear like they would be too helpful.

1 Like

Thanks for your response and links to additional resources! Yea, I looked through the citations from PROBAST which all seem to deal with diagnostic studies. My sense is that this bias doesn’t operate in the same way for prognostic studies as long as the initial candidate predictor in the model is not somehow incorporated in the definition of the outcome in the future (whether that be baseline MMSE when looking at outcome of Alzheimer’s disease diagnosed with using a future MMSE score, baseline LVEF when looking at outcome of HFrEF when using LVEF from a future echo, or even baseline angina scores when looking at outcome of future angina scores after a procedure was done as in this recent study - Predicting Residual Angina After Chronic Total Occlusion Percutaneous Coronary Intervention: Insights from the OPEN‐CTO Registry | Journal of the American Heart Association).