TY - GEN
T1 - An Intrinsically Motivated Robot Explores Non-reward Environments with Output Arbitration
AU - Seno, Takuma
AU - Osawa, Masahiko
AU - Imai, Michita
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - In real worlds, rewards are easily sparse because the state space is huge. Reinforcement learning agents have to achieve exploration skills to get rewards in such an environment. In that case, curiosity defined as internally generated rewards for state prediction error can encourage agents to explore environments. However, when a robot learns its policy by reinforcement learning, changing outputs of the policy cause jerking because of inertia. Jerking prevents state prediction from convergence, which would make the policy learning unstable. In this paper, we propose Arbitrable Intrinsically Motivated Exploration (AIME), which enables robots to stably learn curiosity-based exploration. AIME uses Accumulator Based Arbitration Model (ABAM) that we previously proposed as an ensemble learning method inspired by prefrontal cortex. ABAM adjusts motor controls to improve stability of reward generation and reinforcement learning. In experiments, we show that a robot can explore a non-reward simulated environment with AIME.
AB - In real worlds, rewards are easily sparse because the state space is huge. Reinforcement learning agents have to achieve exploration skills to get rewards in such an environment. In that case, curiosity defined as internally generated rewards for state prediction error can encourage agents to explore environments. However, when a robot learns its policy by reinforcement learning, changing outputs of the policy cause jerking because of inertia. Jerking prevents state prediction from convergence, which would make the policy learning unstable. In this paper, we propose Arbitrable Intrinsically Motivated Exploration (AIME), which enables robots to stably learn curiosity-based exploration. AIME uses Accumulator Based Arbitration Model (ABAM) that we previously proposed as an ensemble learning method inspired by prefrontal cortex. ABAM adjusts motor controls to improve stability of reward generation and reinforcement learning. In experiments, we show that a robot can explore a non-reward simulated environment with AIME.
KW - Deep reinforcement learning
KW - Prefrontal cortex
KW - Robot navigation
UR - http://www.scopus.com/inward/record.url?scp=85053197831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053197831&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-99316-4_37
DO - 10.1007/978-3-319-99316-4_37
M3 - Conference contribution
AN - SCOPUS:85053197831
SN - 9783319993157
T3 - Advances in Intelligent Systems and Computing
SP - 283
EP - 289
BT - Biologically Inspired Cognitive Architectures 2018 - Proceedings of the Ninth Annual Meeting of the BICA Society
A2 - Samsonovich, Alexei V.
PB - Springer Verlag
T2 - 9th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2018
Y2 - 22 August 2018 through 24 August 2018
ER -