Skip to main navigation Skip to search Skip to main content

Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning

  • Joan Aragó
  • , Marc Freixes*
  • *Corresponding author for this work

Research output: Book chapterConference contributionpeer-review

Abstract

The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study focuses on synthesizing Valencian speech to develop an effective text-to-speech system for this linguistic variety. A meticulously recorded corpus, comprising 7 hours of speech data, was utilised to train a model based on a conditional variational autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness compared to the recently released Valencian model by the Aina project, indicating its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.

Original languageEnglish
Title of host publicationArtificial Intelligence Research and Development - Proceedings of the 26th International Conference of the Catalan Association for Artificial Intelligence
EditorsTeresa Alsinet, Xavier Vilasis--Cardona, Daniel Garcia-Costa, Elena Alvarez-Garcia
PublisherIOS Press BV
Pages149-152
Number of pages4
ISBN (Electronic)9781643685434
DOIs
Publication statusPublished - 25 Sept 2024
Event26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024 - Barcelona, Spain
Duration: 2 Oct 20244 Oct 2024

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume390
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024
Country/TerritorySpain
CityBarcelona
Period2/10/244/10/24

Keywords

  • AI applications
  • Human-Machine Communication
  • speech synthesis

Fingerprint

Dive into the research topics of 'Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning'. Together they form a unique fingerprint.

Cite this