Target-Dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots

Shintaro Ishikawa, Komei Sugiura

研究成果: Article査読

抄録

Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by various ambiguities. In existing methods, the referring expressions that specify the relationships between objects were insufficiently modeled. In this letter, we propose Target-dependent UNITER, which learns the relationship between the target object and other objects directly by focusing on the relevant regions within an image, rather than the whole image. Our method is an extension of the UNITER [1]-based Transformer that can be pretrained on general-purpose datasets. We extend the UNITER approach by introducing a new architecture for handling candidate objects. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.

本文言語English
論文番号9525205
ページ(範囲)8401-8408
ページ数8
ジャーナルIEEE Robotics and Automation Letters
6
4
DOI
出版ステータスPublished - 2021 10月

ASJC Scopus subject areas

  • 制御およびシステム工学
  • 生体医工学
  • 人間とコンピュータの相互作用
  • 機械工学
  • コンピュータ ビジョンおよびパターン認識
  • コンピュータ サイエンスの応用
  • 制御と最適化
  • 人工知能

フィンガープリント

「Target-Dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル