First post here, as encouraged by the eminent Prof. Frank Harrell!

The setting is observational clinical research. I would like to know if there are good approaches to deciding whether to implement a predictive algorithm for data imputation/predicted IVs. As an example, let’s say we have estimated the diagnostic properties of a predictive model for a patient attribute (e.g., smoking history or ECOG). Given some estimated distribution of patient risk for a particular category/level, how can we evaluate the utility of using this predictive algorithm?

The data is retrospective so the idea of patient utility is a little vague in my head. One thought was something like analytical utility — if we had some prior knowledge of the effect size of the imputed/predicted IV on our outcome of interest, could we use the estimated misclassification rate distributions from the predictive algorithm to derive when our estimated distribution of the IV effect size would move outside some pre-specified bounds (e.g., mean shift to to the <5% or >95% of the prior effect size?)

Would greatly appreciate any good literature or takes on this subject