Disease *A* is **always** preceded by a variably long non-disease state that is defined by an elevation in relatively cheap and widely available blood biomarker *X*. A relatively small proportion of those with an elevation in biomarker *X* progress to disease *A*, and those that do might do so anywhere between one and 30 years from initial measurement. An elevation in *X* is fairly common and it is not feasible to monitor all persons with an elevation with regards to progression.

Several other cheap and widely available biomarkers and demographic variables, together with the absolute value of *X*, are useful for prognostication with regards to progression to disease *A*. However, an expensive and/or otherwise difficult to obtain biomarker *Y* exists, which strongly predicts progression to disease *A*. Biomarker *Y* correlates strongly (but not perfectly) with *X* and the other easily available prognostic variables.

A clinician is consulted regarding a patient with an elevation in *X* and tasked with deciding whether the patient should be monitored for progression to disease *A*, and if so, how frequently. The results of *X* and the other easily obtainable prognostic variables are available. The decision could be seen to have multiple sub-components:

- what is the clinicianâs (and patientâs) risk threshold taking into account the potential costs and benefits of monitoring?
- how close is the predicted risk of progression to this risk threshold based on the results of a multivariable prognostic model using the cheap and widely available prognostic variables?
- how likely is it that the risk threshold would be crossed (in either direction) given the results of a multivariable prognostic model that also incorporates biomarker
*Y*? - does this likelihood justify the additional cost of obtaining
*Y*?

I have been tasked with developing a multivariable prognostic model of progression to disease *A* on a large prospective cohort of patients with an elevation in biomarker *X*. All individuals in this cohort have undergone testing of biomarker *Y*, **regardless** of underlying risk. I therefore believe I have an unbiased sample of the relationship between *X* & other widely available biomarkers, and *Y*. Instead of just developing a single multivariable prognostic model that includes *X*, *Y*, and the other prognostic variables, I propose developing a set of models:

- A multivariable prognostic model of progression to disease
*A*, given*X*and the other easily obtainable prognostic variables. - A multivariable prognostic model that predicts disease
*A*, given*X*,*Y*and the other easily obtainable prognostic variables. - A multivariable model that predicts the difference in predicted prognosis of disease
*A*, between the prognostic model excluding*Y*and the prognostic model including*Y*.

Given that the results of these models are clearly presented, it seems to me that this approach could inform all the decisions the clinician faces in this common clinical scenario. As long as the models are jointly evaluated both for internal and external validation, I donât see any downsides to this approach.

Are there any obvious flaws in my logic? Is this a common approach to prognostic models, has it been done before and is there any literature on the topic? I havenât been able to find anything.