TY - GEN
T1 - Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning
AU - Itaya, Hidenori
AU - Hirakawa, Tsubasa
AU - Yamashita, Takayoshi
AU - Fujiyoshi, Hironobu
AU - Sugiura, Komei
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/18
Y1 - 2021/7/18
N2 - Deep reinforcement learning (DRL) has great potential for acquiring the optimal action in complex environments such as games and robot control. However, it is difficult to analyze the decision-making of the agent, i.e., the reasons it selects the action acquired by learning. In this work, we propose Mask-Attention A3C (Mask A3C), which introduces an attention mechanism into Asynchronous Advantage Actor-Critic (A3C), which is an actor-critic-based DRL method, and can analyze the decision-making of an agent in DRL. A3C consists of a feature extractor that extracts features from an image, a policy branch that outputs the policy, and a value branch that outputs the state value. In this method, we focus on the policy and value branches and introduce an attention mechanism into them. The attention mechanism applies a mask processing to the feature maps of each branch using mask-attention that expresses the judgment reason for the policy and state value with a heat map. We visualized mask-attention maps for games on the Atari 2600 and found we could easily analyze the reasons behind an agent's decision-making in various game tasks. Furthermore, experimental results showed that the agent could achieve a higher performance by introducing the attention mechanism.
AB - Deep reinforcement learning (DRL) has great potential for acquiring the optimal action in complex environments such as games and robot control. However, it is difficult to analyze the decision-making of the agent, i.e., the reasons it selects the action acquired by learning. In this work, we propose Mask-Attention A3C (Mask A3C), which introduces an attention mechanism into Asynchronous Advantage Actor-Critic (A3C), which is an actor-critic-based DRL method, and can analyze the decision-making of an agent in DRL. A3C consists of a feature extractor that extracts features from an image, a policy branch that outputs the policy, and a value branch that outputs the state value. In this method, we focus on the policy and value branches and introduce an attention mechanism into them. The attention mechanism applies a mask processing to the feature maps of each branch using mask-attention that expresses the judgment reason for the policy and state value with a heat map. We visualized mask-attention maps for games on the Atari 2600 and found we could easily analyze the reasons behind an agent's decision-making in various game tasks. Furthermore, experimental results showed that the agent could achieve a higher performance by introducing the attention mechanism.
UR - http://www.scopus.com/inward/record.url?scp=85116487528&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116487528&partnerID=8YFLogxK
U2 - 10.1109/IJCNN52387.2021.9534363
DO - 10.1109/IJCNN52387.2021.9534363
M3 - Conference contribution
AN - SCOPUS:85116487528
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Joint Conference on Neural Networks, IJCNN 2021
Y2 - 18 July 2021 through 22 July 2021
ER -