Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts

Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito

Research output: Contribution to journalConference articlepeer-review

5 Citations (Scopus)

Abstract

This paper proposes a sentence selection method using a maxi- mum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech syn- thesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech syn- thesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the pho- netic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sen- tence length. The objective experimental results show that the proposed method achieves better coverage and balance of con- texts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.

Original languageEnglish
Pages (from-to)3491-3495
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
Publication statusPublished - 2015 Jan 1
Externally publishedYes
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: 2015 Sep 62015 Sep 10

Keywords

  • Corpus design
  • Entropy
  • Sentence selection
  • Speech synthesis

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts'. Together they form a unique fingerprint.

Cite this