VAE-Based Adversarial Multimodal Domain Transfer for Video-Level Sentiment Analysis

Yanan Wang, Jianming Wu, Kazuaki Furumai, Shinya Wada, Satoshi Kurihara

研究成果: Article査読

抄録

Video-level sentiment analysis is a challenging task and requires systems to obtain discriminative multimodal representations that can capture difference in sentiments across various modalities. However, due to diverse distributions of various modalities and the unified multimodal labels are not always adaptable to unimodal learning, the distance difference between unimodal representations increases, and prevents systems from learning discriminative multimodal representations. In this paper, to obtain more discriminative multimodal representations that can further improve systems' performance, we propose a VAE-based adversarial multimodal domain transfer (VAE-AMDT) and jointly train it with a multi-attention module to reduce the distance difference between unimodal representations. We first perform variational autoencoder (VAE) to make visual, linguistic and acoustic representations follow a common distribution, and then introduce adversarial training to transfer all unimodal representations to a joint embedding space. As a result, we fuse various modalities on this joint embedding space via the multi-attention module, which consists of self-attention, cross-attention and triple-attention for highlighting important sentimental representations over time and modality. Our method improves F1-score of the state-of-the-art by 3.6% on MOSI and 2.9% on MOSEI datasets, and prove its efficacy in obtaining discriminative multimodal representations for video-level sentiment analysis.

本文言語English
ページ(範囲)51315-51324
ページ数10
ジャーナルIEEE Access
10
DOI
出版ステータスPublished - 2022

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • 材料科学(全般)
  • 工学(全般)
  • 電子工学および電気工学

フィンガープリント

「VAE-Based Adversarial Multimodal Domain Transfer for Video-Level Sentiment Analysis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル