I would not use nested CV in the inner loop. Nested CV is, as you noted, a method for performance assessment, not model selection. For that reason, it should not be used in the inner loop of nested CV
because the purpose of that loop is model selection. Plus, your proposed procedure is very complicated and difficult to test/debug.
Instead, to select a decision threshold in the nested CV context, I suggest the following (which, in my experience, produces reasonable results):
Choose statistic of interest. For example if your model should have > 90% sensitivity, then the desired statistic could be “max. specificity among models which have > 90% sensitivity”.
Using all data, learn all models (presumably this is done using cross-validation), treating decision threshold as a hyperparameter. For example, vary the threshold from 0 to 1 with step 0.01 (this is
cheap operation because you don’t have to retrain models). Then remove all models which have <= 90% sensitivity. Rank the remaining ones by specificity. This will give you a single best model, including decision threshold.
Now run nested CV as usual: repeat the selection procedure 2) inside each inner fold. That gives you a single best model for that inner fold, along with decision threshold. Apply that to the outer fold data, which will give you sensitivity/specificity for that inner/outer pair. Repeat this for each pair and, by definition, you have the nested CV performance point estimate of the model selected in 2).
The key is that by treating decision threshold as hyperparameter, nested CV gives you performance assessment of a model with decision threshold. I believe this is what you asked for