TY - JOUR
T1 - Language modeling using augmented echo state networks
AU - Rachez, Arnaud
AU - Hagiwara, Masafumi
PY - 2014/12/1
Y1 - 2014/12/1
N2 - Interest in natural language modeling using neural networks has been growing in the past decade. The objective of this paper is to investigate the predictive capabilities of echo state networks (ESNs) at the task of modeling English sentences. Based on the finding that ESNs exhibit a Markovian organization of their state space that makes them, close to the widely used n-gram models, we describe two modifications of the conventional architecture that allow significant improvement by leveraging the kind of representation developed in the reservoir. Firstly, the addition of pre-recurrent features is shown to capture syntactic similarities between words and can be trained efficiently by using the contracting property of the reservoir to truncate the gradient descent. Secondly, the addition of multiple linear readouts using the mixture of experts framework is also shown to greatly improve accuracy while being trainable in parallel using Expectation- Maximization. Furthermore it can easily be transformed into a supervised mixture of expert model with several variations allowing reducing the training time and can take into account handmade features.
AB - Interest in natural language modeling using neural networks has been growing in the past decade. The objective of this paper is to investigate the predictive capabilities of echo state networks (ESNs) at the task of modeling English sentences. Based on the finding that ESNs exhibit a Markovian organization of their state space that makes them, close to the widely used n-gram models, we describe two modifications of the conventional architecture that allow significant improvement by leveraging the kind of representation developed in the reservoir. Firstly, the addition of pre-recurrent features is shown to capture syntactic similarities between words and can be trained efficiently by using the contracting property of the reservoir to truncate the gradient descent. Secondly, the addition of multiple linear readouts using the mixture of experts framework is also shown to greatly improve accuracy while being trainable in parallel using Expectation- Maximization. Furthermore it can easily be transformed into a supervised mixture of expert model with several variations allowing reducing the training time and can take into account handmade features.
KW - Echo state
KW - Expectation-maximisation
KW - Gradient descent
KW - Language model
KW - Multiple readout
KW - Recurrent neural network
UR - http://www.scopus.com/inward/record.url?scp=84923348999&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84923348999&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84923348999
VL - 10
SP - 1969
EP - 1981
JO - International Journal of Innovative Computing, Information and Control
JF - International Journal of Innovative Computing, Information and Control
SN - 1349-4198
IS - 6
ER -