TY - GEN
T1 - Moment-based Adversarial Training for Embodied Language Comprehension
AU - Ishikawa, Shintaro
AU - Sugiura, Komei
N1 - Funding Information:
This work was partially supported by JSPS KAKENHI Grant Number 20H04269, JST Moonshot, and NEDO.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In this paper, we focus on a vision-and-language task in which a robot is instructed to execute household tasks. Given an instruction such as "Rinse off a mug and place it in the coffee maker,"the robot is required to locate the mug, wash it, and put it in the coffee maker. This is challenging because the robot needs to break down the instruction sentences into subgoals and execute them in the correct order. On the ALFRED benchmark, the performance of state-of-the-art methods is still far lower than that of humans. This is partially because existing methods sometimes fail to infer subgoals that are not explicitly specified in the instruction sentences. We propose Moment-based Adversarial Training (MAT), which uses two types of moments for perturbation updates in adversarial training. We introduce MAT to the embedding spaces of the instruction, subgoals, and state representations to handle their varieties. We validated our method on the ALFRED benchmark, and the results demonstrated that our method outperformed the baseline method for all the metrics on the benchmark.
AB - In this paper, we focus on a vision-and-language task in which a robot is instructed to execute household tasks. Given an instruction such as "Rinse off a mug and place it in the coffee maker,"the robot is required to locate the mug, wash it, and put it in the coffee maker. This is challenging because the robot needs to break down the instruction sentences into subgoals and execute them in the correct order. On the ALFRED benchmark, the performance of state-of-the-art methods is still far lower than that of humans. This is partially because existing methods sometimes fail to infer subgoals that are not explicitly specified in the instruction sentences. We propose Moment-based Adversarial Training (MAT), which uses two types of moments for perturbation updates in adversarial training. We introduce MAT to the embedding spaces of the instruction, subgoals, and state representations to handle their varieties. We validated our method on the ALFRED benchmark, and the results demonstrated that our method outperformed the baseline method for all the metrics on the benchmark.
UR - http://www.scopus.com/inward/record.url?scp=85143591145&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143591145&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956163
DO - 10.1109/ICPR56361.2022.9956163
M3 - Conference contribution
AN - SCOPUS:85143591145
T3 - Proceedings - International Conference on Pattern Recognition
SP - 4139
EP - 4145
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
Y2 - 21 August 2022 through 25 August 2022
ER -