TY - GEN
T1 - Detecting robot-directed speech by situated understanding in object manipulation tasks
AU - Zuo, Xiang
AU - Iwahashi, Naoto
AU - Taguchi, Ryo
AU - Funakoshi, Kotaro
AU - Nakano, Mikio
AU - Matsuda, Shigeki
AU - Sugiura, Komei
AU - Oka, Natsuki
PY - 2010/12/13
Y1 - 2010/12/13
N2 - In this paper, we propose a novel method for a robot to detect robot-directed speech, that is, to distinguish speech that users speak to a robot from speech that users speak to other people or to themselves. The originality of this work is the introduction of a multimodal semantic confidence (MSC) measure, which is used for domain classification of input speech based on the decision on whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, object, and motion confidence with weightings that are optimized by logistic regression. Then we integrate this measure with gaze tracking and conduct experiments under conditions of natural human-robot interaction. Experimental results show that the proposed method achieves a high performance of 94% and 96% in average recall and precision rates, respectively, for robot-directed speech detection.
AB - In this paper, we propose a novel method for a robot to detect robot-directed speech, that is, to distinguish speech that users speak to a robot from speech that users speak to other people or to themselves. The originality of this work is the introduction of a multimodal semantic confidence (MSC) measure, which is used for domain classification of input speech based on the decision on whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, object, and motion confidence with weightings that are optimized by logistic regression. Then we integrate this measure with gaze tracking and conduct experiments under conditions of natural human-robot interaction. Experimental results show that the proposed method achieves a high performance of 94% and 96% in average recall and precision rates, respectively, for robot-directed speech detection.
UR - http://www.scopus.com/inward/record.url?scp=78649875937&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649875937&partnerID=8YFLogxK
U2 - 10.1109/ROMAN.2010.5598729
DO - 10.1109/ROMAN.2010.5598729
M3 - Conference contribution
AN - SCOPUS:78649875937
SN - 9781424479917
T3 - Proceedings - IEEE International Workshop on Robot and Human Interactive Communication
SP - 608
EP - 613
BT - 19th International Symposium in Robot and Human Interactive Communication, RO-MAN 2010
T2 - 19th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2010
Y2 - 12 September 2010 through 15 September 2010
ER -