Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning

  • Joan Aragó
  • , Marc Freixes*
  • *Autor/a de correspondencia de este trabajo

Producción científica: Capítulo del libroContribución a congreso/conferenciarevisión exhaustiva

Resumen

The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study focuses on synthesizing Valencian speech to develop an effective text-to-speech system for this linguistic variety. A meticulously recorded corpus, comprising 7 hours of speech data, was utilised to train a model based on a conditional variational autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness compared to the recently released Valencian model by the Aina project, indicating its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.

Idioma originalInglés
Título de la publicación alojadaArtificial Intelligence Research and Development - Proceedings of the 26th International Conference of the Catalan Association for Artificial Intelligence
EditoresTeresa Alsinet, Xavier Vilasis--Cardona, Daniel Garcia-Costa, Elena Alvarez-Garcia
EditorialIOS Press BV
Páginas149-152
Número de páginas4
ISBN (versión digital)9781643685434
DOI
EstadoPublicada - 25 sept 2024
Evento26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024 - Barcelona, Espana
Duración: 2 oct 20244 oct 2024

Serie de la publicación

NombreFrontiers in Artificial Intelligence and Applications
Volumen390
ISSN (versión impresa)0922-6389
ISSN (versión digital)1879-8314

Conferencia

Conferencia26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024
País/TerritorioEspana
CiudadBarcelona
Período2/10/244/10/24

Huella

Profundice en los temas de investigación de 'Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning'. En conjunto forman una huella única.

Cómo citar