Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

Riko Suzuki, Hitomi Yanaka, Koji Mineshima, Daisuke Bekki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form (subject, predicate, object) that can be translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

Original languageEnglish
Title of host publicationMMSR 2021 - Multimodal Semantic Representations, Proceedings of the 1st Workshop
EditorsLucia Donatelli, Nikhil Krishnaswamy, Kenneth Lai, James Pustejovsky
PublisherAssociation for Computational Linguistics (ACL)
Pages102-107
Number of pages6
ISBN (Electronic)9781954085213
Publication statusPublished - 2021
Event1st Workshop on Multimodal Semantic Representations, MMSR 2021 - Virtual, Groningen, Netherlands
Duration: 2021 Jun 16 → …

Publication series

NameMMSR 2021 - Multimodal Semantic Representations, Proceedings of the 1st Workshop

Conference

Conference1st Workshop on Multimodal Semantic Representations, MMSR 2021
Country/TerritoryNetherlands
CityVirtual, Groningen
Period21/6/16 → …

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference'. Together they form a unique fingerprint.

Cite this