TY - JOUR
T1 - Compositional Evaluation on Japanese Textual Entailment and Similarity
AU - Yanaka, Hitomi
AU - Mineshima, Koji
N1 - Funding Information:
We thank the anonymous reviewers and the Action Editor for helpful comments and suggestions that improved this paper. We also thank Daisuke Kawahara and Tomohide Shibata for helpful advice on the experimental settings of Japanese RoBERTa and BERT models. This work was supported by JSPS KAKENHI grant number JP20K19868, JST, PRESTO grant number JPMJPR21C8, and JST, CREST grant number JPMJCR2114, Japan.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022/11/22
Y1 - 2022/11/22
N2 - Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclu-sively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually trans-lated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.
AB - Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclu-sively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually trans-lated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.
UR - http://www.scopus.com/inward/record.url?scp=85142892464&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142892464&partnerID=8YFLogxK
U2 - 10.1162/tacl_a_00518
DO - 10.1162/tacl_a_00518
M3 - Article
AN - SCOPUS:85142892464
SN - 2307-387X
VL - 10
SP - 1266
EP - 1284
JO - Transactions of the Association for Computational Linguistics
JF - Transactions of the Association for Computational Linguistics
ER -