TY - GEN
T1 - Speech Synthesis of Valencian Using a Conditional Variational Autoencoder with Adversarial Learning
AU - Aragó, Joan
AU - Freixes, Marc
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024/9/25
Y1 - 2024/9/25
N2 - The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study focuses on synthesizing Valencian speech to develop an effective text-to-speech system for this linguistic variety. A meticulously recorded corpus, comprising 7 hours of speech data, was utilised to train a model based on a conditional variational autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness compared to the recently released Valencian model by the Aina project, indicating its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.
AB - The growing demand for high-quality speech synthesis systems in minority languages presents a notable challenge for researchers. In response, this study focuses on synthesizing Valencian speech to develop an effective text-to-speech system for this linguistic variety. A meticulously recorded corpus, comprising 7 hours of speech data, was utilised to train a model based on a conditional variational autoencoder with adversarial learning, specifically Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS). Additionally, a pretrained multispeaker model was fine-tuned using 30 minutes, and the entire corpus. Perceptual testing was conducted to evaluate the synthesised speech quality, revealing promising results. Notably, the proposed model demonstrated competitiveness compared to the recently released Valencian model by the Aina project, indicating its efficacy in generating natural and fluent Valencian speech. These findings contribute to advancing the field of Valencian text-to-speech synthesis and carry implications for the development of speech synthesis systems in other minority languages.
KW - AI applications
KW - Human-Machine Communication
KW - speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=85217057584&partnerID=8YFLogxK
U2 - 10.3233/FAIA240427
DO - 10.3233/FAIA240427
M3 - Conference contribution
AN - SCOPUS:85217057584
T3 - Frontiers in Artificial Intelligence and Applications
SP - 149
EP - 152
BT - Artificial Intelligence Research and Development - Proceedings of the 26th International Conference of the Catalan Association for Artificial Intelligence
A2 - Alsinet, Teresa
A2 - Vilasis--Cardona, Xavier
A2 - Garcia-Costa, Daniel
A2 - Alvarez-Garcia, Elena
PB - IOS Press BV
T2 - 26th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2024
Y2 - 2 October 2024 through 4 October 2024
ER -