Observational data with diagnosis partially determining treatment (imperfect instrumental variable?)

Here’s the setting - we have a retrospective dataset of patients undergoing a certain surgery. The surgery has two variants, let’s call them A and B. We are interested in comparing a set of outcomes like length of hospital stay and blood loss in each variant. The choice of variant is based on clinical judgement and thus there are inevitable selection effects biasing the comparison. If that was all, we’d have little options beyond just reporting the results descriptively.

We however have an extra piece of information: we know that for a subset of the patients, their diagnosis implies that B (which should be less invasive) is not an option and thus they would have always be assigned to variant A, basically no clinical judgement required. We can also assume that the reasons to choose A in the cases requiring clinical judgement are a softer version of the same criteria, i.e. if a patient is not currently diagnosed with a problem that would make A necessary, but the clinician suspects the problem may develop in the future, they would be more likely to prefer A over B for that patient.

This means we actually have three groups of patients:

  1. Always A - patients whose diagnosis requires A
  2. Chosen A - patients that could have gotten B, but did in fact get A
  3. B - patients receiving variant B

Intuitively, the difference between Chosen A and Always A is driven purely by selection and is not causal. If the assumption that clinical judgement picks up on similar features as the diagnosis is correct, then the strength of selection effects between Chosen A and B should be correlated with the magnitude of difference in outcomes between Always A and Chosen A. In particular, if the outcomes are similar between Always A and Chosen A, then this is a weak assurance that the observed difference between Chosen A and B could actually be causal.

Formally, we can see this as an instrumental variable problem, where the diagnosis is an instrument. However, it seems quite likely that the exclusion criterion is violated for the diagnosis, as the diagnosis is plausibly correlated with worse clinical outcomes (and thus the residual). Still, the diagnosis could be less correlated with the residual than the choice of procedure and thus serve as an imperfect instrument making the methods of either Conley, Hansen & Rossi 2012 - Plausibly Exogenous (unpaywalled version) or Nevo & Rosen 2012 - Identification with imperfect instruments applicable to get at least a less biased estimate.

My questions are:

A) Does this line of though look reasonable? Under which conditions could using the diagnosis information warrant moving from pure description to (weak) causal claims?
B) Are there examples of similar analyses in the medical literature that you believe have done good a job?

Thanks for any feedback.