The degree of agreement among independent observers or raters when evaluating the same phenomenon. High inter-rater reliability indicates that different people can use the measurement system consistently and reach similar conclusions.
Emerged in the mid-20th century as psychology became more empirical and multiple observers were used to reduce bias. 'Inter-rater' combines Latin inter (between) with 'rater,' emphasizing agreement between different human judges or coders.
This is why Olympic gymnastics uses multiple judges—if they wildly disagree, the scoring system is broken! In psychology, low inter-rater reliability often reveals that our definitions of behaviors or symptoms are too vague or subjective.
Complete word intelligence in one call. Free tier — 50 lookups/day.