I want to design a study in which I want to compare two type of measurements. It’s an imaging application and the objective is to see if an abbreviated type of measurements is consistent with the more troublesome but “gold-standard” one. In order to apply clinically this type of abbreviated measurements I was thinking of calculating a sample size for a correlation coefficient, I suppose I would trust a type of measurement that correlates at least 0.8 with the original one for the type of clinical setting.

Would this be correct? Has anyone any suggestion about the sample size calculation?

A correlation is not a very good measure of agreement. You will get a perfect agreement between cholesterol measured in US units (mg%) and SI units but, of course, the individual values will be completely different, by a factor of about 37, if I can remember back to the 1980s when SI replaced mg%.

You can use a Bland and Altman approach to calculate agreement, which is, the authors have argued, applicable even when one of the measures is a gold standard.

However, this theory-free measure ignores the consequences of error. If you want to replace a gold standard measure with a simpler, easier method, then you need to be sure that the effects on decision making are not worrying. And to do this, you need to know what the test is used for. I’ve worked on some physiotherapy protocols where we classified errors into

no consequence – clinical management would be the same based on either test

significant disagreement – the difference between the two tests will result in a difference in patient management’ and

major error – test fails to detect something that would be serious of missed

This is just one kind of approach, but it has the advantage of giving the prevalence of errors of several types.

I suggest that you don’t use a correlation, which no-one in real life can interpret, and focus on the effects of method change on the decisional process in which the measurement is used. That’s what people really need to know before they switch.

A correlation of 0.8 is very far from implying good agreement. It means that the X variable can explain 0.8 * 0.8 or 64% of the variation in the Y variable. This is far higher than the sort of value you would get in an epidemiological study. But in a measurement study, it is distinctly low. Besides, the correlation disregards whether one reading consistently over-reads or under-reads, relative to the other. If the scatter diagram consists of several points on a straight line, for example Y = X + 1, or Y = 2*X, the correlation will be exactly 1, but this is far from implying perfect agreement. IF the X and Y scales claim to be directly comparable, then the Bland-Altman approach someone else suggested is optimal. If they don’t, then correlation and (closely linked) regression analyses are all you can do. In either case, r = 0.8, even though it will be statistically significant to reject H0: r = 0 on any reasonable sample size, is far from saying that X and Y are equivalent.