Language modeling using augmented echo state networks

Arnaud Rachez, Masafumi Hagiwara

Research output: Contribution to journalArticle

Abstract

Interest in natural language modeling using neural networks has been growing in the past decade. The objective of this paper is to investigate the predictive capabilities of echo state networks (ESNs) at the task of modeling English sentences. Based on the finding that ESNs exhibit a Markovian organization of their state space that makes them, close to the widely used n-gram models, we describe two modifications of the conventional architecture that allow significant improvement by leveraging the kind of representation developed in the reservoir. Firstly, the addition of pre-recurrent features is shown to capture syntactic similarities between words and can be trained efficiently by using the contracting property of the reservoir to truncate the gradient descent. Secondly, the addition of multiple linear readouts using the mixture of experts framework is also shown to greatly improve accuracy while being trainable in parallel using Expectation- Maximization. Furthermore it can easily be transformed into a supervised mixture of expert model with several variations allowing reducing the training time and can take into account handmade features.

Original languageEnglish
Pages (from-to)1969-1981
Number of pages13
JournalInternational Journal of Innovative Computing, Information and Control
Volume10
Issue number6
Publication statusPublished - 2014 Dec 1

Fingerprint

Echo State Network
Mixture of Experts
Language Modeling
Truncate
N-gram
Expectation Maximization
Gradient Descent
Syntactics
Natural Language
State Space
Neural Networks
Neural networks
Modeling
Model
Syntax
Similarity
Framework
Training
Architecture

Keywords

  • Echo state
  • Expectation-maximisation
  • Gradient descent
  • Language model
  • Multiple readout
  • Recurrent neural network

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems
  • Software
  • Theoretical Computer Science

Cite this

Language modeling using augmented echo state networks. / Rachez, Arnaud; Hagiwara, Masafumi.

In: International Journal of Innovative Computing, Information and Control, Vol. 10, No. 6, 01.12.2014, p. 1969-1981.

Research output: Contribution to journalArticle

@article{2e20f9bd89f149d5940ff8aef86006bd,
title = "Language modeling using augmented echo state networks",
abstract = "Interest in natural language modeling using neural networks has been growing in the past decade. The objective of this paper is to investigate the predictive capabilities of echo state networks (ESNs) at the task of modeling English sentences. Based on the finding that ESNs exhibit a Markovian organization of their state space that makes them, close to the widely used n-gram models, we describe two modifications of the conventional architecture that allow significant improvement by leveraging the kind of representation developed in the reservoir. Firstly, the addition of pre-recurrent features is shown to capture syntactic similarities between words and can be trained efficiently by using the contracting property of the reservoir to truncate the gradient descent. Secondly, the addition of multiple linear readouts using the mixture of experts framework is also shown to greatly improve accuracy while being trainable in parallel using Expectation- Maximization. Furthermore it can easily be transformed into a supervised mixture of expert model with several variations allowing reducing the training time and can take into account handmade features.",
keywords = "Echo state, Expectation-maximisation, Gradient descent, Language model, Multiple readout, Recurrent neural network",
author = "Arnaud Rachez and Masafumi Hagiwara",
year = "2014",
month = "12",
day = "1",
language = "English",
volume = "10",
pages = "1969--1981",
journal = "International Journal of Innovative Computing, Information and Control",
issn = "1349-4198",
publisher = "IJICIC Editorial Office",
number = "6",

}

TY - JOUR

T1 - Language modeling using augmented echo state networks

AU - Rachez, Arnaud

AU - Hagiwara, Masafumi

PY - 2014/12/1

Y1 - 2014/12/1

N2 - Interest in natural language modeling using neural networks has been growing in the past decade. The objective of this paper is to investigate the predictive capabilities of echo state networks (ESNs) at the task of modeling English sentences. Based on the finding that ESNs exhibit a Markovian organization of their state space that makes them, close to the widely used n-gram models, we describe two modifications of the conventional architecture that allow significant improvement by leveraging the kind of representation developed in the reservoir. Firstly, the addition of pre-recurrent features is shown to capture syntactic similarities between words and can be trained efficiently by using the contracting property of the reservoir to truncate the gradient descent. Secondly, the addition of multiple linear readouts using the mixture of experts framework is also shown to greatly improve accuracy while being trainable in parallel using Expectation- Maximization. Furthermore it can easily be transformed into a supervised mixture of expert model with several variations allowing reducing the training time and can take into account handmade features.

AB - Interest in natural language modeling using neural networks has been growing in the past decade. The objective of this paper is to investigate the predictive capabilities of echo state networks (ESNs) at the task of modeling English sentences. Based on the finding that ESNs exhibit a Markovian organization of their state space that makes them, close to the widely used n-gram models, we describe two modifications of the conventional architecture that allow significant improvement by leveraging the kind of representation developed in the reservoir. Firstly, the addition of pre-recurrent features is shown to capture syntactic similarities between words and can be trained efficiently by using the contracting property of the reservoir to truncate the gradient descent. Secondly, the addition of multiple linear readouts using the mixture of experts framework is also shown to greatly improve accuracy while being trainable in parallel using Expectation- Maximization. Furthermore it can easily be transformed into a supervised mixture of expert model with several variations allowing reducing the training time and can take into account handmade features.

KW - Echo state

KW - Expectation-maximisation

KW - Gradient descent

KW - Language model

KW - Multiple readout

KW - Recurrent neural network

UR - http://www.scopus.com/inward/record.url?scp=84923348999&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923348999&partnerID=8YFLogxK

M3 - Article

VL - 10

SP - 1969

EP - 1981

JO - International Journal of Innovative Computing, Information and Control

JF - International Journal of Innovative Computing, Information and Control

SN - 1349-4198

IS - 6

ER -