Mixing HMM-based spanish speech synthesis with a CBR for prosody estimation

Xavi Gonzalvo, Ignasi Iriondo, Joan Claudi Socoró, Francesc Alias, Carlos Monzo

Research output: Book chapterConference contributionpeer-review

Abstract

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMMTTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR. or the HMM system.

Original languageEnglish
Title of host publicationAdvances in Nonlinear Speech Processing - International Conference on Nonlinear Speech Processing, NOLISP 2007, Revised Selected Papers
PublisherSpringer Verlag
Pages78-85
Number of pages8
ISBN (Print)3540773460, 9783540773467
DOIs
Publication statusPublished - 2007
EventInternational Conference on Nonlinear Speech Processing, NOLISP 2007 - Paris, France
Duration: 22 May 200725 May 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4885 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Nonlinear Speech Processing, NOLISP 2007
Country/TerritoryFrance
CityParis
Period22/05/0725/05/07

Fingerprint

Dive into the research topics of 'Mixing HMM-based spanish speech synthesis with a CBR for prosody estimation'. Together they form a unique fingerprint.

Cite this