First of all, my name is Alejandro Mejia, i’m a physician currently at residency training in orthopedic and traumatology from Bogotá, Colombia at Pontifical Javerian University
I’m developing a reliability study about a new classification for tibial plateau fractures, I’ve consulted with different epidemiologists and they’ve said this project it´s very complicated and i’ve been turned down many times, I’d like to ask you for council about it.
This new classification revisits the old Schatzker classification for tibial plateau fractures comprises six types of fracture patterns, previously we didn’t count with CT-scan so the fractures were classified only with X-rays, new evidence suggest since the use of CT-scans that 3D approach is necessary for the treatment of this fractures.
The “revisiting” of the classification comprises this six types of patterns and sub classifies each one of them, leaving us with 26 different fracture patterns. The purpose of my study it´s evaluating the reliability of this new CT-scan based classification comparing the inter and intraobserver reliability between the creator of the classification (Dr. Mauricio Kfuri as the gold standard), two Trauma surgeons with more than 10 years of experience, 2 recently graduate orthopedic surgeon and 2 orthopedic in-training residents
How do you think the sample size should be measured?
Should I use the Fleiss Kappa analysis an the Cohen Kappa analysis?
Having just done an inter-rater reliability analysis for a maybe similar study type (concerning patellar height measurements using CT scans), I might be able to help. Not entirely sure about the sample size issue, but a quick search turned up:
There are some pieces of information missing in your question:
Are the variables continuous or categorical? This determines whether you’re looking at ICC or Kappa.
Will you be randomly selecting the sample of participants for inclusion in the IRR analysis?
You listed 7(?) potential raters. Will all 7 be rating the same participants? This determines whether you’ll be using Fleiss or Cohen’s Kappa (I believe Cohen’s Kappa is limited to 2 raters).
If you variables are continuous, will you be using the average of their ratings for your dataset, or just a single measurement for each participant?
Here’s a pretty intuitive how-to on Kappa using R: