Parametric model for vocal effort interpolation with Harmonics Plus Noise Models

Àngel Calzada Defez, Joan Claudi Socoró Carrié, Robert A.J. Clark

Producción científica: Contribución a una conferenciaContribuciónrevisión exhaustiva

2 Citas (Scopus)

Resumen

It is known that voice quality plays an important role in expressive speech. In this paper, we present a methodology for modifying vocal effort level, which can be applied by text-to-speech (TTS) systems to provide the flexibility needed to improve the naturalness of synthesized speech. This extends previous work using low order Linear Prediction Coefficients (LPC) where the flexibility was constrained by the amount of vocal effort levels available in the corpora. The proposed methodology overcomes these limitations by replacing the low order LPC by ninth order polynomials to allow not only vocal effort to be modified towards the available templates, but also to allow the generation of intermediate vocal effort levels between levels available in training data. This flexibility comes from the combination of Harmonics plus Noise Models and using a parametric model to represent the spectral envelope. The conducted perceptual tests demonstrate the effectiveness of the proposed technique in performing vocal effort interpolations while maintaining the signal quality in the final synthesis. The proposed technique can be used in unit-selection TTS systems to reduce corpus size while increasing its flexibility, and the techniques could potentially be employed by HMM based speech synthesis systems if appropriate acoustic features are being used.

Idioma originalInglés
Páginas25-30
Número de páginas6
EstadoPublicada - 2013
Evento8th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2013 - Barcelona, Espana
Duración: 31 ago 20132 sept 2013

Conferencia

Conferencia8th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2013
País/TerritorioEspana
CiudadBarcelona
Período31/08/132/09/13

Huella

Profundice en los temas de investigación de 'Parametric model for vocal effort interpolation with Harmonics Plus Noise Models'. En conjunto forman una huella única.

Citar esto