TY - JOUR
T1 - Multimodal attention branch network for perspective-free sentence generation
AU - Magassouba, Aly
AU - Sugiura, Komei
AU - Kawai, Hisashi
N1 - Publisher Copyright:
Copyright © 2019, The Authors. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/9/9
Y1 - 2019/9/9
N2 - In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as "bring me the yellow toy from the upper part of the white shelf" includes referring expressions, i.e., "from the white upper part of the white shelf". To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natural sentences in an end-to-end manner. Multi-ABN uses multiple images of the same fixed scene to generate sentences that are not tied to a particular viewpoint. This approach combines a linguistic attention branch mechanism with several attention branch mechanisms. We evaluated our approach, which outperforms the state-of-the-art method on a standard metrics. Our method also allows us to visualize the alignment between the linguistic input and the visual features.
AB - In this paper, we address the automatic sentence generation of fetching instructions for domestic service robots. Typical fetching commands such as "bring me the yellow toy from the upper part of the white shelf" includes referring expressions, i.e., "from the white upper part of the white shelf". To solve this task, we propose a multimodal attention branch network (Multi-ABN) which generates natural sentences in an end-to-end manner. Multi-ABN uses multiple images of the same fixed scene to generate sentences that are not tied to a particular viewpoint. This approach combines a linguistic attention branch mechanism with several attention branch mechanisms. We evaluated our approach, which outperforms the state-of-the-art method on a standard metrics. Our method also allows us to visualize the alignment between the linguistic input and the visual features.
KW - Domestic service robots
KW - Image captioning
UR - http://www.scopus.com/inward/record.url?scp=85094225963&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094225963&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85094225963
JO - Mathematical Social Sciences
JF - Mathematical Social Sciences
SN - 0165-4896
ER -