TY - GEN
T1 - Reinforcement Learning of Trajectory Distributions
T2 - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
AU - Ewerton, Marco
AU - Maeda, Guilherme
AU - Koert, Dorothea
AU - Kolev, Zlatko
AU - Takahashi, Masaki
AU - Peters, Jan
N1 - Funding Information:
The research leading to these results has received funding from the German Federal Ministry of Education and Research (BMBF) in the project 16SV7984 (KoBo34), from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 640554 (SKILLS4ROBOTS), from a project commissioned by the Japanese New Energy and Industrial Technology Development Organization (NEDO) and from the Swiss National Science Foundation through the HEAP project (Human-Guided Learning and Benchmarking of Robotic Heap Sorting, ERA-net CHIST-ERA).
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - The majority of learning from demonstration approaches do not address suboptimal demonstrations or cases when drastic changes in the environment occur after the demonstrations were made. For example, in real teleoperation tasks, the demonstrations provided by the user are often suboptimal due to interface and hardware limitations. In tasks involving co-manipulation and manipulation planning, the environment often changes due to unexpected obstacles rendering previous demonstrations invalid. This paper presents a reinforcement learning algorithm that exploits the use of relevance functions to tackle such problems. This paper introduces the Pearson correlation as a measure of the relevance of policy parameters in regards to each of the components of the cost function to be optimized. The method is demonstrated in a static environment where the quality of the teleoperation is compromised by the visual interface (operating a robot in a three-dimensional task by using a simple 2D monitor). Afterward, we tested the method on a dynamic environment using a real 7-DoF robot arm where distributions are computed online via Gaussian Process regression.
AB - The majority of learning from demonstration approaches do not address suboptimal demonstrations or cases when drastic changes in the environment occur after the demonstrations were made. For example, in real teleoperation tasks, the demonstrations provided by the user are often suboptimal due to interface and hardware limitations. In tasks involving co-manipulation and manipulation planning, the environment often changes due to unexpected obstacles rendering previous demonstrations invalid. This paper presents a reinforcement learning algorithm that exploits the use of relevance functions to tackle such problems. This paper introduces the Pearson correlation as a measure of the relevance of policy parameters in regards to each of the components of the cost function to be optimized. The method is demonstrated in a static environment where the quality of the teleoperation is compromised by the visual interface (operating a robot in a three-dimensional task by using a simple 2D monitor). Afterward, we tested the method on a dynamic environment using a real 7-DoF robot arm where distributions are computed online via Gaussian Process regression.
UR - http://www.scopus.com/inward/record.url?scp=85081159730&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081159730&partnerID=8YFLogxK
U2 - 10.1109/IROS40897.2019.8967856
DO - 10.1109/IROS40897.2019.8967856
M3 - Conference contribution
AN - SCOPUS:85081159730
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 4294
EP - 4300
BT - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 November 2019 through 8 November 2019
ER -