TY - GEN
T1 - VISTURE
T2 - 10th Conference on Human-Agent Interaction, HAI 2022
AU - Shimoyama, Kaon
AU - Okuoka, Kohei
AU - Kimoto, Mitsuhiko
AU - Imai, Michita
N1 - Funding Information:
This work was supported in part by JST, CREST Grant Number JPMJCR19A1, Japan and JSPS KAKENHI Grant Number JP20K19897.
Publisher Copyright:
© 2022 ACM.
PY - 2022/12/5
Y1 - 2022/12/5
N2 - This paper proposes VISTURE, a system for generating a robot's gesture and speech by using video as input. VISTURE assumes a situation in which a robot conveys what it saw with a camera to a person who was absent. The value of this paper is that we have performed a case study to investigate the expressions that Japanese people use to describe video scenes, and used the results to build VISTURE. In particular, we found classification of expressions depicting the video scenes throughout the case study: Foreground information that is the relevant event of the scene and Background one that is not the main point of the description giving the entire scene. Foreground and Background are referred in combination. VISTURE employs the classification to generate human-like expressions. Moreover, we designed the method to determine Foreground and Background, and it can generate multiple combinations of expressions. We investigated the people's impression of a robot performing the gestures and speech generated by VISTURE to evaluate the quality of those gestures and speech. The results showed that the robot was perceived as more likable and capable when it performed gestures.
AB - This paper proposes VISTURE, a system for generating a robot's gesture and speech by using video as input. VISTURE assumes a situation in which a robot conveys what it saw with a camera to a person who was absent. The value of this paper is that we have performed a case study to investigate the expressions that Japanese people use to describe video scenes, and used the results to build VISTURE. In particular, we found classification of expressions depicting the video scenes throughout the case study: Foreground information that is the relevant event of the scene and Background one that is not the main point of the description giving the entire scene. Foreground and Background are referred in combination. VISTURE employs the classification to generate human-like expressions. Moreover, we designed the method to determine Foreground and Background, and it can generate multiple combinations of expressions. We investigated the people's impression of a robot performing the gestures and speech generated by VISTURE to evaluate the quality of those gestures and speech. The results showed that the robot was perceived as more likable and capable when it performed gestures.
KW - Gesture generation
KW - Human-robot interaction
KW - Speech generation
UR - http://www.scopus.com/inward/record.url?scp=85144603452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144603452&partnerID=8YFLogxK
U2 - 10.1145/3527188.3561931
DO - 10.1145/3527188.3561931
M3 - Conference contribution
AN - SCOPUS:85144603452
T3 - HAI 2022 - Proceedings of the 10th Conference on Human-Agent Interaction
SP - 185
EP - 193
BT - HAI 2022 - Proceedings of the 10th Conference on Human-Agent Interaction
PB - Association for Computing Machinery, Inc
Y2 - 5 December 2022 through 8 December 2022
ER -