Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require adjudication by at third rater. However, from an assessment validation standpoint, questions remain about the impact of negotiation on the scoring inference of a validation argument (Kane, 2006, 2012). Thus, this mixed-methods study evaluates the impact of score negotiation on scoring consistency in second language writing assessment, as well as negotiation’s potential contributions to raters’ understanding of test constructs and the local curriculum. Many-faceted Rasch measurement (MFRM) was used to analyze scores (n = 524) from the writing section an EAP placement exam and to quantify how negotiation affected rater severity, self-consistency, and bias toward individual categories and test takers. Semi-structured interviews with raters (n = 3) documented their perspectives about how negotiation affects scoring and teaching. In this study, negotiation did not change rater severity, though it greatly reduced measures of rater bias. Furthermore, rater comments indicated that negotiation supports a nuanced understanding of the rubric categories and increases positive washback on teaching practices.
- Many-faceted Rasch measurement
- performance assessment
ASJC Scopus subject areas
- Language and Linguistics
- Social Sciences (miscellaneous)
- Linguistics and Language