Martin Bland posted a nice review of his approach to observer variation studies:
https://wwwusers.york.ac.uk/~mb55/meas/observer.pdf
Would welcome comments on a couple technical questions about this post :

One of the strong assumptions cited in section 3 is that measurement error variance VAR_W is the same for all observers.
If there are repeated measurements by each observer (as in section 4), couldn’t each observer’s data be used (ignoring the other observers) to estimate individual measurement errors VAR_W1 to VAR_Wo (where o is the number of observers in the study)? 
Further, if we assume the measurers actually do have differing measurement erros, even If there are only single repetitions in the experiment, couldn’t we still use the same model from section 3, with the heterogeneity term VAR_H capturing this deviation from the common measurement error VAR_W?

Figure 2 on page 8 plots measurements by several observers of a buried tumour phantom against the known true diameter of the phantom. In line with the Bland & Altman ref 1996c (BMJ 1996;313:106), would it make sense to plot the standard deviation for each ‘tumour’ versus the true value?
 I am wondering if anyone has applied similar analyses to interobserver variation on image annotations / segmentations. We have been struggling in our research with the standard Dice / Jaccard indices, which fail to take spatial information into account.