TY - JOUR
T1 - CNN-encoded radical-level representation for Japanese processing
AU - Ke, Yuanzhi
AU - Hagiwara, Masafumi
N1 - Funding Information:
He received his B.E., M.E. and Ph.D degrees in electrical engineering from Keio University, Japan, in 1982, 1984 and 1987, respectively. Since 1987 he has been with Keio Uni-versity, where he is now a Professor. From 1991 to 1993, he was a visiting scholar at Stanford University. He received IEEE Consumer Electronics Society Chester Sall Award in 1990, Author Award from the Japan Society of Fuzzy The-ory and Systems in 1996, Technical Award and Paper Awards from Japan Society of Kansei Engineering in 2003, 2004 and 2014, Best research award from Japanese Neural Network Society in 2013. He is a member of IEICE, IPSJ, SOFT, IEE of Japan, Japan Society of Kansei Engineering, JNNS and IEEE (Senior member). His research interests include neural networks, fuzzy systems, and affective engineering. He was the former president of the Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT).
Publisher Copyright:
© 2018, Japanese Society for Artificial Intelligence. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Although word embeddings are powerful, weakness on rare words, unknown words and issues of large vocabulary motivated people to explore alternative representations. While the character embeddings have been successful for alphabetical languages, Japanese is difficult to be processed at the character level as well because of the large vocabulary of kanji, written in the Chinese characters. In order to achieve fewer parameters and better generalization on infrequent words and characters, we proposed a model that encodes Japanese texts from the radical-level representation, inspired by the experimental findings in the field of psycholinguistics. The proposed model is comprised of a convolutional local encoder and a recurrent global encoder. For the convolutional encoder, we propose a novel combination of two kinds of convolutional filters of different strides in one layer to extract information from the different levels. We compare the proposed radical-level model with the state-of-the-art word and character embedding-based models in the sentiment classification task. The proposed model outperformed the state-of-the-art models for the randomly sampled texts and the texts that contain unknown characters, with 91% and 12% fewer parameters than the word embedding-based and character embedding-based models, respectively. Especially for the test sets of unknown characters, the results by the proposed model were 4.01% and 2.38% above the word embedding-based and character embedding-based baselines, respectively. The proposed model is powerful with cheaper computational and storage cost, can be used for devices with limited storage and to process texts of rare characters.
AB - Although word embeddings are powerful, weakness on rare words, unknown words and issues of large vocabulary motivated people to explore alternative representations. While the character embeddings have been successful for alphabetical languages, Japanese is difficult to be processed at the character level as well because of the large vocabulary of kanji, written in the Chinese characters. In order to achieve fewer parameters and better generalization on infrequent words and characters, we proposed a model that encodes Japanese texts from the radical-level representation, inspired by the experimental findings in the field of psycholinguistics. The proposed model is comprised of a convolutional local encoder and a recurrent global encoder. For the convolutional encoder, we propose a novel combination of two kinds of convolutional filters of different strides in one layer to extract information from the different levels. We compare the proposed radical-level model with the state-of-the-art word and character embedding-based models in the sentiment classification task. The proposed model outperformed the state-of-the-art models for the randomly sampled texts and the texts that contain unknown characters, with 91% and 12% fewer parameters than the word embedding-based and character embedding-based models, respectively. Especially for the test sets of unknown characters, the results by the proposed model were 4.01% and 2.38% above the word embedding-based and character embedding-based baselines, respectively. The proposed model is powerful with cheaper computational and storage cost, can be used for devices with limited storage and to process texts of rare characters.
KW - Convolutional neural networks
KW - Deep learning
KW - Natural language processing
KW - Sub-character language modeling
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85049758629&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049758629&partnerID=8YFLogxK
U2 - 10.1527/tjsai.D-I23
DO - 10.1527/tjsai.D-I23
M3 - Article
AN - SCOPUS:85049758629
SN - 1346-0714
VL - 33
JO - Transactions of the Japanese Society for Artificial Intelligence
JF - Transactions of the Japanese Society for Artificial Intelligence
IS - 4
ER -