Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning

Joan Aragó, Marc Freixes*

*Autor corresponent d’aquest treball

Producció científica: Capítol de llibreContribució a congrés/conferènciaAvaluat per experts

Resum

The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study focuses on synthesizing Valencian speech to develop an effective text-to-speech system for this linguistic variety. A meticulously recorded corpus, comprising 7 hours of speech data, was utilised to train a model based on a conditional variational autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness compared to the recently released Valencian model by the Aina project, indicating its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.

Idioma originalAnglès
Títol de la publicacióArtificial Intelligence Research and Development - Proceedings of the 26th International Conference of the Catalan Association for Artificial Intelligence
EditorsTeresa Alsinet, Xavier Vilasis--Cardona, Daniel Garcia-Costa, Elena Alvarez-Garcia
EditorIOS Press BV
Pàgines149-152
Nombre de pàgines4
ISBN (electrònic)9781643685434
DOIs
Estat de la publicacióPublicada - 25 de set. 2024
Publicat externament
Esdeveniment26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024 - Barcelona, Spain
Durada: 2 d’oct. 20244 d’oct. 2024

Sèrie de publicacions

NomFrontiers in Artificial Intelligence and Applications
Volum390
ISSN (imprès)0922-6389
ISSN (electrònic)1879-8314

Conferència

Conferència26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024
País/TerritoriSpain
CiutatBarcelona
Període2/10/244/10/24

Fingerprint

Navegar pels temes de recerca de 'Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning'. Junts formen un fingerprint únic.

Com citar-ho