A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes

Koichiro Takita, Masafumi Hagiwara

研究成果: Article

3 引用 (Scopus)

抄録

This paper considers learning by a pulse neural network and proposes a new reinforcement learning algorithm focusing on the ability of pulse neuron elements to process time series. The conventional integrator neuron element is modeled in terms of the average firing rate of the biological neuron. But the pulse neuron is a modeling of the input-output relation of the time-series pulse (spike) and the decay of the internal state (internal potential). The application of such neural networks has been considered in recent engineering studies. It is known in particular that a pulse neuron with a high decay rate acts as a coincidence detector. The proposed model combines pulse neuron elements with different decay rates, which facilitates the processing of the time-series input information and the discrimination of fuzzy states in a partially observable Markov decision process. The proposed network is a four-layered feedforward network in which the pulse neuron elements forming the two hidden layers provide a pseudo-representation of the state in the environment. The elements generate a secondary reinforcement signal which results in learning similar to the conventional reinforcement scheme based on the state evaluation function. A computer experiment verifies that the proposed model works effectively in an environment which is strongly partially observable.

元の言語English
ページ(範囲)42-52
ページ数11
ジャーナルSystems and Computers in Japan
36
発行部数3
DOI
出版物ステータスPublished - 2005 3

Fingerprint

Partially Observable Markov Decision Process
Reinforcement learning
Reinforcement Learning
Learning algorithms
Neurons
Learning Algorithm
Neuron
Neural Networks
Neural networks
Time series
Reinforcement
Decay Rate
Internal
Feedforward Networks
Function evaluation
Computer Experiments
Evaluation Function
Coincidence
Spike
Discrimination

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Theoretical Computer Science
  • Computational Theory and Mathematics

これを引用

@article{652989a03908452697512719fa4a9670,
title = "A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes",
abstract = "This paper considers learning by a pulse neural network and proposes a new reinforcement learning algorithm focusing on the ability of pulse neuron elements to process time series. The conventional integrator neuron element is modeled in terms of the average firing rate of the biological neuron. But the pulse neuron is a modeling of the input-output relation of the time-series pulse (spike) and the decay of the internal state (internal potential). The application of such neural networks has been considered in recent engineering studies. It is known in particular that a pulse neuron with a high decay rate acts as a coincidence detector. The proposed model combines pulse neuron elements with different decay rates, which facilitates the processing of the time-series input information and the discrimination of fuzzy states in a partially observable Markov decision process. The proposed network is a four-layered feedforward network in which the pulse neuron elements forming the two hidden layers provide a pseudo-representation of the state in the environment. The elements generate a secondary reinforcement signal which results in learning similar to the conventional reinforcement scheme based on the state evaluation function. A computer experiment verifies that the proposed model works effectively in an environment which is strongly partially observable.",
keywords = "Partially observable Markov decision process, Pulse neural network, Reinforcement learning",
author = "Koichiro Takita and Masafumi Hagiwara",
year = "2005",
month = "3",
doi = "10.1002/scj.10645",
language = "English",
volume = "36",
pages = "42--52",
journal = "Systems and Computers in Japan",
issn = "0882-1666",
publisher = "John Wiley and Sons Inc.",
number = "3",

}

TY - JOUR

T1 - A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes

AU - Takita, Koichiro

AU - Hagiwara, Masafumi

PY - 2005/3

Y1 - 2005/3

N2 - This paper considers learning by a pulse neural network and proposes a new reinforcement learning algorithm focusing on the ability of pulse neuron elements to process time series. The conventional integrator neuron element is modeled in terms of the average firing rate of the biological neuron. But the pulse neuron is a modeling of the input-output relation of the time-series pulse (spike) and the decay of the internal state (internal potential). The application of such neural networks has been considered in recent engineering studies. It is known in particular that a pulse neuron with a high decay rate acts as a coincidence detector. The proposed model combines pulse neuron elements with different decay rates, which facilitates the processing of the time-series input information and the discrimination of fuzzy states in a partially observable Markov decision process. The proposed network is a four-layered feedforward network in which the pulse neuron elements forming the two hidden layers provide a pseudo-representation of the state in the environment. The elements generate a secondary reinforcement signal which results in learning similar to the conventional reinforcement scheme based on the state evaluation function. A computer experiment verifies that the proposed model works effectively in an environment which is strongly partially observable.

AB - This paper considers learning by a pulse neural network and proposes a new reinforcement learning algorithm focusing on the ability of pulse neuron elements to process time series. The conventional integrator neuron element is modeled in terms of the average firing rate of the biological neuron. But the pulse neuron is a modeling of the input-output relation of the time-series pulse (spike) and the decay of the internal state (internal potential). The application of such neural networks has been considered in recent engineering studies. It is known in particular that a pulse neuron with a high decay rate acts as a coincidence detector. The proposed model combines pulse neuron elements with different decay rates, which facilitates the processing of the time-series input information and the discrimination of fuzzy states in a partially observable Markov decision process. The proposed network is a four-layered feedforward network in which the pulse neuron elements forming the two hidden layers provide a pseudo-representation of the state in the environment. The elements generate a secondary reinforcement signal which results in learning similar to the conventional reinforcement scheme based on the state evaluation function. A computer experiment verifies that the proposed model works effectively in an environment which is strongly partially observable.

KW - Partially observable Markov decision process

KW - Pulse neural network

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=15244342008&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=15244342008&partnerID=8YFLogxK

U2 - 10.1002/scj.10645

DO - 10.1002/scj.10645

M3 - Article

AN - SCOPUS:15244342008

VL - 36

SP - 42

EP - 52

JO - Systems and Computers in Japan

JF - Systems and Computers in Japan

SN - 0882-1666

IS - 3

ER -