Improving Goal-Oriented Visual Dialogue by Asking Fewer Questions

Soma Kanazawa, Shoya Matsumori, Michita Imai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An agent who adaptively asks the user questions to seek information is a crucial element in designing a real-world artificial intelligence agent. In particular, goal-oriented visual dialogue, which locates an object of interest from a group of visually presented objects by asking verbal questions, must be able to efficiently narrow down and identify objects through question generation. Several models based on GuessWhat?! and CLEVR Ask have been published, most of which leverage reinforcement learning to maximize the success rate of the task. However, existing models take a policy of asking questions up to a predefined limit, resulting in the generation of redundant questions. Moreover, the generated questions often refer only to a limited number of objects, which prevents efficient narrowing down and the identification of a wide range of attributes. This paper proposes Two-Stream Splitter (TSS) for redundant question reduction and efficient question generation. TSS utilizes a self-attention structure in the processing of image features and location features of objects to enable efficient narrowing down of candidate objects by combining the information content of both. Experimental results on the CLEVR Ask dataset show that the proposed method reduces redundant questions and enables efficient interaction compared to previous models.

Original languageEnglish
Title of host publicationNeural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
EditorsTeddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, Achmad Nizar Hidayanto
PublisherSpringer Science and Business Media Deutschland GmbH
Pages158-169
Number of pages12
ISBN (Print)9783030922696
DOIs
Publication statusPublished - 2021
Event28th International Conference on Neural Information Processing, ICONIP 2021 - Virtual, Online
Duration: 2021 Dec 82021 Dec 12

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13109 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference28th International Conference on Neural Information Processing, ICONIP 2021
CityVirtual, Online
Period21/12/821/12/12

Keywords

  • Attention mechanism
  • CLEVR Ask
  • Goal-oriented visual dialogue
  • GuessWhat?!
  • Visual state estimation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Improving Goal-Oriented Visual Dialogue by Asking Fewer Questions'. Together they form a unique fingerprint.

Cite this