Compositional Evaluation on Japanese Textual Entailment and Similarity

Hitomi Yanaka, Koji Mineshima

Research output: Contribution to journalArticlepeer-review


Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclu-sively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually trans-lated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.

Original languageEnglish
Pages (from-to)1266-1284
Number of pages19
JournalTransactions of the Association for Computational Linguistics
Publication statusPublished - 2022 Nov 22

ASJC Scopus subject areas

  • Communication
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Compositional Evaluation on Japanese Textual Entailment and Similarity'. Together they form a unique fingerprint.

Cite this