Thank you for this interesting discussion. We will also look to U statistics in our future papers, but previously have always relied on ICCs as the norm for measuring repeatability. I am often confused by the varying formulas and am glad for the above discussion and helpful article referenced above by Koo and Li to interpret McGraw&Wong’s structure better. Other authors have advocated that a minimal detectable change (MDC95) corresponds to the real world better. This is a bit odd since the MDC95 could perhaps be viewed as an intermediary calculation towards an ICC since it is essentially a standard error. I also wonder if, at a theoretical level, the MDC95 might be more similar to a U-statistic than an ICC?
In our last paper, our biostatistician calculated the ICC with with a linear mixed model on log-transformed data, using R’s lme command from the nlme library. I’m not sure this even fits into the 10 McGraw & Wong forms which as noted above seem to be based in ANOVA. However, the results are really close to the two-way ANOVA for the ICC(A,1) case 2A/3A. Here is a link to the raw data and the lme results (R code not included).
We then used the ICC to calculate the MDC95. The results surprised me in that the L and R dorsal forearm, which had very different ICCs (0.49 and 0.85) ended up with very similar MDCs (90.4 and 84.0 N/m) due to the large difference in population standard deviation (calculation shown in the raw data link above). Going forward with longitudinal studies to track patient changes over time, I am now not sure if we should assume that these two measurement sites have roughly equal reproducibility due to the similar MDC in N/m, or if the site reliability is actually quite different due to the difference in ICC (which admittedly have overlapping confidence intervals due to the small sample size). Any insights would be greatly appreciated!
To me, the MDC95 seems to make the most sense since ultimately we want to determine whether we can be confident that a change in a patient’s measure over time was likely true biological change, or whether the change might simply fall under measurement error.