Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue

Shoya Matsumori, Kosuke Shingyouchi, Yuki Abe, Yosuke Fukuchi, Komei Sugiura, Michita Imai

研究成果: Conference contribution

抄録

Building an interactive artificial intelligence that can ask questions about the real world is one of the biggest challenges for vision and language problems. In particular, goal-oriented visual dialogue, where the aim of the agent is to seek information by asking questions during a turn-taking dialogue, has been gaining scholarly attention recently. While several existing models based on the GuessWhat?! dataset [10] have been proposed, the Questioner typically asks simple category-based questions or absolute spatial questions. This might be problematic for complex scenes where the objects share attributes, or in cases where descriptive questions are required to distinguish objects. In this paper, we propose a novel Questioner architecture, called Unified Questioner Transformer (UniQer), for descriptive question generation with referring expressions. In addition, we build a goal-oriented visual dialogue task called CLEVR Ask. It synthesizes complex scenes that require the Questioner to generate descriptive questions. We train our model with two variants of CLEVR Ask datasets. The results of the quantitative and qualitative evaluations show that UniQer outperforms the baseline.

本文言語English
ホスト出版物のタイトルProceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
出版社Institute of Electrical and Electronics Engineers Inc.
ページ1878-1887
ページ数10
ISBN(電子版)9781665428125
DOI
出版ステータスPublished - 2021
イベント18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada
継続期間: 2021 10月 112021 10月 17

出版物シリーズ

名前Proceedings of the IEEE International Conference on Computer Vision
ISSN(印刷版)1550-5499

Conference

Conference18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
国/地域Canada
CityVirtual, Online
Period21/10/1121/10/17

ASJC Scopus subject areas

  • ソフトウェア
  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル