Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

Riko Suzuki, Hitomi Yanaka, Koji Mineshima, Daisuke Bekki

研究成果: Conference contribution

抄録

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action labels, and 1,942 action triplets of the form (subject, predicate, object) that can be translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

本文言語English
ホスト出版物のタイトルMMSR 2021 - Multimodal Semantic Representations, Proceedings of the 1st Workshop
編集者Lucia Donatelli, Nikhil Krishnaswamy, Kenneth Lai, James Pustejovsky
出版社Association for Computational Linguistics (ACL)
ページ102-107
ページ数6
ISBN(電子版)9781954085213
出版ステータスPublished - 2021
イベント1st Workshop on Multimodal Semantic Representations, MMSR 2021 - Virtual, Groningen, Netherlands
継続期間: 2021 6月 16 → …

出版物シリーズ

名前MMSR 2021 - Multimodal Semantic Representations, Proceedings of the 1st Workshop

Conference

Conference1st Workshop on Multimodal Semantic Representations, MMSR 2021
国/地域Netherlands
CityVirtual, Groningen
Period21/6/16 → …

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • ソフトウェア

フィンガープリント

「Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル