Simultaneous estimation of vocal tract and voice source parameters based on an ARX model

Wen Ding, Hideki Kasuya, Shuichi Adachi

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

A novel adaptive pitch-synchronous analysis method is proposed to estimate simultaneously vocal tract (formant/antiformant) and voice source parameters from speech waveforms. We use the parametric Rosenberg-Klatt (RK) model to generate a glottal waveform and an autoregressive-exogenous (ARX) model to represent voiced speech production process. The Kalman filter algorithm is used to estimate the formant/antiformant parameters from the coefficients of the ARX model, and the simulated annealing method is employed as a nonlinear optimization approach to estimate the voice source parameters. The two approaches work together in a system identification procedure to find the best set of the parameters of both the models. The new method has been compared using synthetic speech with some other approaches in terms of accuracy of estimated parameter values and has been proved to be superior. We also show that the proposed method can estimate accurately the parameters from natural speech sounds. A major application of the analysis method lies in a concatenative formant synthesizer which allows us to make flexible control of voice quality of synthetic speech.

Original languageEnglish
Pages (from-to)738-743
Number of pages6
JournalIEICE Transactions on Information and Systems
VolumeE78-D
Issue number6
Publication statusPublished - 1995 Jun
Externally publishedYes

Fingerprint

Simulated annealing
Kalman filters
Identification (control systems)
Acoustic waves

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

Cite this

Simultaneous estimation of vocal tract and voice source parameters based on an ARX model. / Ding, Wen; Kasuya, Hideki; Adachi, Shuichi.

In: IEICE Transactions on Information and Systems, Vol. E78-D, No. 6, 06.1995, p. 738-743.

Research output: Contribution to journalArticle

@article{a09bb92d70e64f829c461f0e3a4d01c8,
title = "Simultaneous estimation of vocal tract and voice source parameters based on an ARX model",
abstract = "A novel adaptive pitch-synchronous analysis method is proposed to estimate simultaneously vocal tract (formant/antiformant) and voice source parameters from speech waveforms. We use the parametric Rosenberg-Klatt (RK) model to generate a glottal waveform and an autoregressive-exogenous (ARX) model to represent voiced speech production process. The Kalman filter algorithm is used to estimate the formant/antiformant parameters from the coefficients of the ARX model, and the simulated annealing method is employed as a nonlinear optimization approach to estimate the voice source parameters. The two approaches work together in a system identification procedure to find the best set of the parameters of both the models. The new method has been compared using synthetic speech with some other approaches in terms of accuracy of estimated parameter values and has been proved to be superior. We also show that the proposed method can estimate accurately the parameters from natural speech sounds. A major application of the analysis method lies in a concatenative formant synthesizer which allows us to make flexible control of voice quality of synthetic speech.",
author = "Wen Ding and Hideki Kasuya and Shuichi Adachi",
year = "1995",
month = "6",
language = "English",
volume = "E78-D",
pages = "738--743",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "6",

}

TY - JOUR

T1 - Simultaneous estimation of vocal tract and voice source parameters based on an ARX model

AU - Ding, Wen

AU - Kasuya, Hideki

AU - Adachi, Shuichi

PY - 1995/6

Y1 - 1995/6

N2 - A novel adaptive pitch-synchronous analysis method is proposed to estimate simultaneously vocal tract (formant/antiformant) and voice source parameters from speech waveforms. We use the parametric Rosenberg-Klatt (RK) model to generate a glottal waveform and an autoregressive-exogenous (ARX) model to represent voiced speech production process. The Kalman filter algorithm is used to estimate the formant/antiformant parameters from the coefficients of the ARX model, and the simulated annealing method is employed as a nonlinear optimization approach to estimate the voice source parameters. The two approaches work together in a system identification procedure to find the best set of the parameters of both the models. The new method has been compared using synthetic speech with some other approaches in terms of accuracy of estimated parameter values and has been proved to be superior. We also show that the proposed method can estimate accurately the parameters from natural speech sounds. A major application of the analysis method lies in a concatenative formant synthesizer which allows us to make flexible control of voice quality of synthetic speech.

AB - A novel adaptive pitch-synchronous analysis method is proposed to estimate simultaneously vocal tract (formant/antiformant) and voice source parameters from speech waveforms. We use the parametric Rosenberg-Klatt (RK) model to generate a glottal waveform and an autoregressive-exogenous (ARX) model to represent voiced speech production process. The Kalman filter algorithm is used to estimate the formant/antiformant parameters from the coefficients of the ARX model, and the simulated annealing method is employed as a nonlinear optimization approach to estimate the voice source parameters. The two approaches work together in a system identification procedure to find the best set of the parameters of both the models. The new method has been compared using synthetic speech with some other approaches in terms of accuracy of estimated parameter values and has been proved to be superior. We also show that the proposed method can estimate accurately the parameters from natural speech sounds. A major application of the analysis method lies in a concatenative formant synthesizer which allows us to make flexible control of voice quality of synthetic speech.

UR - http://www.scopus.com/inward/record.url?scp=0029323429&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029323429&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0029323429

VL - E78-D

SP - 738

EP - 743

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 6

ER -