Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis

Francesc Alías; Xavier Llorà

Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis

Francesc Alías, Xavier Llorà

Facultat Internacional de Comerç i Economia Digital La Salle

Producció científica: Contribució a una conferència › Contribució › Avaluat per experts

12 Cites (Scopus)

Resum

Unit selection text-to-speech (TTS) conversion is an ongoing research for the speech synthesis community. This paper is focused on tuning the weights involved in the target concatenation cost metrics. We propose a method for automatically adjusting these weights simultaneously by means of diphone and triphone pairs. This method is based on techniques provided by the evolutionary computation community, taking advantage of their robustness in noisy domains. The experiments and their analyses demonstrate its good performance in this problem, thus, overcoming some constraints assumed by previous works leading to a new interesting framework for further investigations.

Idioma original	Anglès
Pàgines	1333-1336
Nombre de pàgines	4
Estat de la publicació	Publicada - 2003
Esdeveniment	8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Durada: 1 de set. 2003 → 4 de set. 2003

Conferència

Conferència	8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
País/Territori	Switzerland
Ciutat	Geneva
Període	1/09/03 → 4/09/03

Altres arxius i enllaços

Link to publication in Scopus

Com citar-ho

@conference{8efbc9d5bf5947acba31c5d71470b17d,

title = "Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis",

abstract = "Unit selection text-to-speech (TTS) conversion is an ongoing research for the speech synthesis community. This paper is focused on tuning the weights involved in the target concatenation cost metrics. We propose a method for automatically adjusting these weights simultaneously by means of diphone and triphone pairs. This method is based on techniques provided by the evolutionary computation community, taking advantage of their robustness in noisy domains. The experiments and their analyses demonstrate its good performance in this problem, thus, overcoming some constraints assumed by previous works leading to a new interesting framework for further investigations.",

author = "Francesc Al{\'i}as and Xavier Llor{\`a}",

note = "Funding Information: We would truly thank Ignasi Iriondo for many useful discussion during the preparation of this manuscript. We would also like to thank the Generalitat de Catalunya the D.U.R.S.I. for their support under grant number 2000FI-00679. This work was also supported by the Technology Research, Education and Commercialization Center (TRECC), a program of the University of Illinois at Urbana-Champaign, administered by the National Center for Supercomputing Applications (NCSA) and funded by the Office of Naval Research under grant N00014-01-1-0175. We would also like to thank the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grant F49620-00-0163. Research funding for this work was also provided by a grant from the National Science Foundation under grant DMI-9908252.; 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 ; Conference date: 01-09-2003 Through 04-09-2003",

year = "2003",

language = "English",

pages = "1333--1336",

}

TY - CONF

T1 - Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis

AU - Alías, Francesc

AU - Llorà, Xavier

N1 - Funding Information: We would truly thank Ignasi Iriondo for many useful discussion during the preparation of this manuscript. We would also like to thank the Generalitat de Catalunya the D.U.R.S.I. for their support under grant number 2000FI-00679. This work was also supported by the Technology Research, Education and Commercialization Center (TRECC), a program of the University of Illinois at Urbana-Champaign, administered by the National Center for Supercomputing Applications (NCSA) and funded by the Office of Naval Research under grant N00014-01-1-0175. We would also like to thank the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grant F49620-00-0163. Research funding for this work was also provided by a grant from the National Science Foundation under grant DMI-9908252.

PY - 2003

Y1 - 2003

N2 - Unit selection text-to-speech (TTS) conversion is an ongoing research for the speech synthesis community. This paper is focused on tuning the weights involved in the target concatenation cost metrics. We propose a method for automatically adjusting these weights simultaneously by means of diphone and triphone pairs. This method is based on techniques provided by the evolutionary computation community, taking advantage of their robustness in noisy domains. The experiments and their analyses demonstrate its good performance in this problem, thus, overcoming some constraints assumed by previous works leading to a new interesting framework for further investigations.

AB - Unit selection text-to-speech (TTS) conversion is an ongoing research for the speech synthesis community. This paper is focused on tuning the weights involved in the target concatenation cost metrics. We propose a method for automatically adjusting these weights simultaneously by means of diphone and triphone pairs. This method is based on techniques provided by the evolutionary computation community, taking advantage of their robustness in noisy domains. The experiments and their analyses demonstrate its good performance in this problem, thus, overcoming some constraints assumed by previous works leading to a new interesting framework for further investigations.

UR - http://www.scopus.com/inward/record.url?scp=85009210634&partnerID=8YFLogxK

M3 - Contribution

AN - SCOPUS:85009210634

SP - 1333

EP - 1336

T2 - 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003

Y2 - 1 September 2003 through 4 September 2003

ER -

Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis

Resum

Conferència

Altres arxius i enllaços

Fingerprint

Com citar-ho