The 4th edition of this book covers Chance-corrected Agreement Coefficients (CAC) for the analysis of categorical ratings, as well as Intraclass Correlation Coefficients (ICC) for the analysis of quantitative ratings. Both topics were discussed in parts II and III of that book, which is divided into 4 parts. The 5th edition however, is released in 2 volumes. The present volume 1, focuses on CAC methods whereas volume 2 is devoted to ICC methods. The decision to release 2 volumes was made at the request of numerous readers of the 4th edition who indicated that they are often interested in either CAC techniques or in ICC techniques, but rarely in both at a given point in time. Moreover, the large number of topics covered in this 5th edition could not be squeezed in a single book, without it becoming voluminous.
Here is a summary of the main changes from the 4th edition that you will find in this book:
- Chapter 2 is new to the 5th edition and covers various ways of setting up your rating dataset before analysis. My decision to add this chapter stems from a large number of questions I received from researchers who wanted to know how their rating data should be organized. I noticed that sometimes, organizing your data properly will clear the pathway towards resolving most computational problems.
- Chapter 6 entitled Agreement Coefficients and Statistical Inference'' has also been expanded substantially. Section 6.5 on sample size calculation in particular, covers new power calculation methods not discussed in the 4th edition. These are nonparametric methods for computing the optimal number of raters and subjects at the design stage of an inter-rater reliability experiment.
- Chapter 8 on the analysis of agreement coefficients conditionally upon specific categories has been substantially rewritten with more details and added clarity.
- Chapter 9 on the analysis of nominal-scale inter-rater reliability data is new. It addresses several new techniques that were not covered in any of the previous editions of this book. One of these techniques is about the important notion of inter-annotator agreement, which plays a key role in the fields of Natural Language Processing (NLP), computational linguistics or text analytics.
Also discussed in this chapter is the important problem of measuring the extent of agreement among 3 raters or more, when the same subject cannot be rated by more than 2 raters. I show how such a study can be designed and discuss the statistical implications of such a design.
The remaining techniques described in this chapter are related to the influence analysis, the intra-rater reliability and to Cronbach's alpha coefficient. Influence analysis is used to detect problem raters in low-agreement studies, whereas Cronbach's alpha coefficient is commonly used in item analysis.