Robot-directed speech detection using multimodal semantic confidence based on speech, image, and motion

Xiang Zuo, Naoto Iwahashi, Ryo Taguchi, Shigeki Matsuda, Komei Sugiura, Kotaro Funakoshi, Mikio Nakano, Natsuki Oka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this paper, we propose a novel method to detect robot-directed (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure. The MSC measure is used to decide whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression. Experimental results show that, compared with a baseline method that uses speech confidence only, MSC achieved an absolute increase of 5% for clean speech and 12% for noisy speech in terms of average maximum F-measure.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
Pages2458-2461
Number of pages4
DOIs
Publication statusPublished - 2010 Nov 8
Externally publishedYes
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 2010 Mar 142010 Mar 19

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
CountryUnited States
CityDallas, TX
Period10/3/1410/3/19

Keywords

  • Human-robot interaction
  • Multimodal semantic confidence
  • Robot-directed speech detection

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Robot-directed speech detection using multimodal semantic confidence based on speech, image, and motion'. Together they form a unique fingerprint.

  • Cite this

    Zuo, X., Iwahashi, N., Taguchi, R., Matsuda, S., Sugiura, K., Funakoshi, K., Nakano, M., & Oka, N. (2010). Robot-directed speech detection using multimodal semantic confidence based on speech, image, and motion. In 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings (pp. 2458-2461). [5494889] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2010.5494889