Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese

Yuanzhi Ke, Masafumi Hagiwara

研究成果: Conference article査読

5 被引用数 (Scopus)

抄録

The character vocabulary can be very large in non-alphabetic languages such as Chinese and Japanese, which makes neural network models huge to process such languages. We explored a model for sentiment classification that takes the embeddings of the radicals of the Chinese characters, i.e, hanzi of Chinese and kanji of Japanese. Our model is composed of a CNN word feature encoder and a bi-directional RNN document feature encoder. The results achieved are on par with the character embedding-based models, and close to the state-of-the-art word embedding-based models, with 90% smaller vocabulary, and at least 13% and 80% fewer parameters than the character embedding-based models and word embedding-based models respectively. The results suggest that the radical embedding-based approach is cost-effective for machine learning on Chinese and Japanese.

本文言語English
ページ(範囲)561-573
ページ数13
ジャーナルJournal of Machine Learning Research
77
出版ステータスPublished - 2017
イベント9th Asian Conference on Machine Learning, ACML 2017 - Seoul, Korea, Republic of
継続期間: 2017 11 152017 11 17

ASJC Scopus subject areas

  • ソフトウェア
  • 制御およびシステム工学
  • 統計学および確率
  • 人工知能

フィンガープリント

「Radical-level Ideograph Encoder for RNN-based Sentiment Analysis of Chinese and Japanese」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル