In order to synthesize natural sounding speech with voice quality variations, we propose a concatenative synthesis method based on stored formant/antiformant templates of vowel-consonant-vowel (VCV) segments and on sophisticated control of voice source parameters. By using the parametric Rosenberg-Klatt (RK) model to generate a voiced source waveform and an autoregressive exogenous (ARX) model to represent voiced speech production process, a new adaptive pitch-synchronous analysis method has been devised to estimate the model parameters from which the templates are semiautomatically created. The Kalman filter algorithm deals with the ARX model identification and a simulated annealing method is used for the nonlinear optimization to estimate the voice source parameters. The method has been tested with synthetic speech sounds by comparing widi some other approaches in terms of the accuracy of estimated parameter values. Preliminary synthesis experiments have shown that natural sounding speech with various voice qualities can be generated with the proposed method by manipulating the voice source parameters.
|出版ステータス||Published - 1994|
|イベント||3rd International Conference on Spoken Language Processing, ICSLP 1994 - Yokohama, Japan|
継続期間: 1994 9月 18 → 1994 9月 22
|Conference||3rd International Conference on Spoken Language Processing, ICSLP 1994|
|Period||94/9/18 → 94/9/22|
ASJC Scopus subject areas