TY - CONF
T1 - Parametric model for vocal effort interpolation with Harmonics Plus Noise Models
AU - Defez, Àngel Calzada
AU - Carrié, Joan Claudi Socoró
AU - Clark, Robert A.J.
N1 - Funding Information:
We would like to thank Dr. Mark Schroder and DFKI for allowing us to use the NECA corpus which was specially designed for vocal effort research and Comisisionat per a Universitats i Recerca (CUR) from the DIUE of the Generalitat de Catalunya and the European Social Funds (2011 BE-DGR 01084) for funding to visit CSTR.
Funding Information:
We would like to thank Dr. Mark Schröder and DFKI for allowing us to use the NECA corpus which was specially designed for vocal effort research and Comisisionat per a Universitats i Re-cerca (CUR) from the DIUE of the Generalitat de Catalunya and the European Social Funds (2011 BE-DGR 01084) for funding
Publisher Copyright:
© SSW 2013. All rights reserved.
PY - 2013
Y1 - 2013
N2 - It is known that voice quality plays an important role in expressive speech. In this paper, we present a methodology for modifying vocal effort level, which can be applied by text-to-speech (TTS) systems to provide the flexibility needed to improve the naturalness of synthesized speech. This extends previous work using low order Linear Prediction Coefficients (LPC) where the flexibility was constrained by the amount of vocal effort levels available in the corpora. The proposed methodology overcomes these limitations by replacing the low order LPC by ninth order polynomials to allow not only vocal effort to be modified towards the available templates, but also to allow the generation of intermediate vocal effort levels between levels available in training data. This flexibility comes from the combination of Harmonics plus Noise Models and using a parametric model to represent the spectral envelope. The conducted perceptual tests demonstrate the effectiveness of the proposed technique in performing vocal effort interpolations while maintaining the signal quality in the final synthesis. The proposed technique can be used in unit-selection TTS systems to reduce corpus size while increasing its flexibility, and the techniques could potentially be employed by HMM based speech synthesis systems if appropriate acoustic features are being used.
AB - It is known that voice quality plays an important role in expressive speech. In this paper, we present a methodology for modifying vocal effort level, which can be applied by text-to-speech (TTS) systems to provide the flexibility needed to improve the naturalness of synthesized speech. This extends previous work using low order Linear Prediction Coefficients (LPC) where the flexibility was constrained by the amount of vocal effort levels available in the corpora. The proposed methodology overcomes these limitations by replacing the low order LPC by ninth order polynomials to allow not only vocal effort to be modified towards the available templates, but also to allow the generation of intermediate vocal effort levels between levels available in training data. This flexibility comes from the combination of Harmonics plus Noise Models and using a parametric model to represent the spectral envelope. The conducted perceptual tests demonstrate the effectiveness of the proposed technique in performing vocal effort interpolations while maintaining the signal quality in the final synthesis. The proposed technique can be used in unit-selection TTS systems to reduce corpus size while increasing its flexibility, and the techniques could potentially be employed by HMM based speech synthesis systems if appropriate acoustic features are being used.
KW - Vocal effort interpolation
KW - expressive speech synthesis
KW - harmonics plus noise model
UR - http://www.scopus.com/inward/record.url?scp=85039165020&partnerID=8YFLogxK
M3 - Contribution
AN - SCOPUS:85039165020
SP - 25
EP - 30
T2 - 8th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2013
Y2 - 31 August 2013 through 2 September 2013
ER -