Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition

Shinsuke Okita, Yasue Mitsukura, Nozomu Hamada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

For the purpose of automatic speech recognition and speech animation synthesis, speaker verification and so on, there have been studies on 'viseme'. Viseme is a visually identifiable unit of utterance or the equivalent unit in the visual domain of the phoneme in audio domain. The classification and the discrimination method of visemes are still important topics. This paper focuses on the number of classification units and a discrimination procedure of Japanese visemes: We extend the number of visemes from 6 to 9 to expanse the word representation by their series, then propose the hierarchical weighted discrimination using multiple discriminative analysis (MDA) to enhance the discriminative ability. In order to verify and discuss the availability of our proposals, visemes discrimination and word recognition experiments were conducted. From these results, the validity of the proposed methods was confirmed.

Original languageEnglish
Title of host publicationProceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013
PublisherIEEE Computer Society
Pages62-67
Number of pages6
ISBN (Print)9781479922093
DOIs
Publication statusPublished - 2013
Event2013 IEEE Conference on Systems, Process and Control, ICSPC 2013 - Kuala Lumpur, Malaysia
Duration: 2013 Dec 132013 Dec 15

Other

Other2013 IEEE Conference on Systems, Process and Control, ICSPC 2013
CountryMalaysia
CityKuala Lumpur
Period13/12/1313/12/15

Fingerprint

Speech recognition
Animation
Availability
Experiments

Keywords

  • image processing
  • pattern recognition
  • visemes
  • visual speech recognition

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Okita, S., Mitsukura, Y., & Hamada, N. (2013). Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition. In Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013 (pp. 62-67). [6735104] IEEE Computer Society. https://doi.org/10.1109/SPC.2013.6735104

Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition. / Okita, Shinsuke; Mitsukura, Yasue; Hamada, Nozomu.

Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013. IEEE Computer Society, 2013. p. 62-67 6735104.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Okita, S, Mitsukura, Y & Hamada, N 2013, Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition. in Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013., 6735104, IEEE Computer Society, pp. 62-67, 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013, Kuala Lumpur, Malaysia, 13/12/13. https://doi.org/10.1109/SPC.2013.6735104
Okita S, Mitsukura Y, Hamada N. Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition. In Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013. IEEE Computer Society. 2013. p. 62-67. 6735104 https://doi.org/10.1109/SPC.2013.6735104
Okita, Shinsuke ; Mitsukura, Yasue ; Hamada, Nozomu. / Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition. Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013. IEEE Computer Society, 2013. pp. 62-67
@inproceedings{8344e3afaef14c4eaac750faa8881dec,
title = "Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition",
abstract = "For the purpose of automatic speech recognition and speech animation synthesis, speaker verification and so on, there have been studies on 'viseme'. Viseme is a visually identifiable unit of utterance or the equivalent unit in the visual domain of the phoneme in audio domain. The classification and the discrimination method of visemes are still important topics. This paper focuses on the number of classification units and a discrimination procedure of Japanese visemes: We extend the number of visemes from 6 to 9 to expanse the word representation by their series, then propose the hierarchical weighted discrimination using multiple discriminative analysis (MDA) to enhance the discriminative ability. In order to verify and discuss the availability of our proposals, visemes discrimination and word recognition experiments were conducted. From these results, the validity of the proposed methods was confirmed.",
keywords = "image processing, pattern recognition, visemes, visual speech recognition",
author = "Shinsuke Okita and Yasue Mitsukura and Nozomu Hamada",
year = "2013",
doi = "10.1109/SPC.2013.6735104",
language = "English",
isbn = "9781479922093",
pages = "62--67",
booktitle = "Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Augmented classification of Japanese visemes and hierarchical weighted discrimination for visual speech recognition

AU - Okita, Shinsuke

AU - Mitsukura, Yasue

AU - Hamada, Nozomu

PY - 2013

Y1 - 2013

N2 - For the purpose of automatic speech recognition and speech animation synthesis, speaker verification and so on, there have been studies on 'viseme'. Viseme is a visually identifiable unit of utterance or the equivalent unit in the visual domain of the phoneme in audio domain. The classification and the discrimination method of visemes are still important topics. This paper focuses on the number of classification units and a discrimination procedure of Japanese visemes: We extend the number of visemes from 6 to 9 to expanse the word representation by their series, then propose the hierarchical weighted discrimination using multiple discriminative analysis (MDA) to enhance the discriminative ability. In order to verify and discuss the availability of our proposals, visemes discrimination and word recognition experiments were conducted. From these results, the validity of the proposed methods was confirmed.

AB - For the purpose of automatic speech recognition and speech animation synthesis, speaker verification and so on, there have been studies on 'viseme'. Viseme is a visually identifiable unit of utterance or the equivalent unit in the visual domain of the phoneme in audio domain. The classification and the discrimination method of visemes are still important topics. This paper focuses on the number of classification units and a discrimination procedure of Japanese visemes: We extend the number of visemes from 6 to 9 to expanse the word representation by their series, then propose the hierarchical weighted discrimination using multiple discriminative analysis (MDA) to enhance the discriminative ability. In order to verify and discuss the availability of our proposals, visemes discrimination and word recognition experiments were conducted. From these results, the validity of the proposed methods was confirmed.

KW - image processing

KW - pattern recognition

KW - visemes

KW - visual speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84897786557&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897786557&partnerID=8YFLogxK

U2 - 10.1109/SPC.2013.6735104

DO - 10.1109/SPC.2013.6735104

M3 - Conference contribution

AN - SCOPUS:84897786557

SN - 9781479922093

SP - 62

EP - 67

BT - Proceedings - 2013 IEEE Conference on Systems, Process and Control, ICSPC 2013

PB - IEEE Computer Society

ER -