Unit selection text-to-speech (TTS) conversion is an ongoing research for the speech synthesis community. This paper is focused on tuning the weights involved in the target concatenation cost metrics. We propose a method for automatically adjusting these weights simultaneously by means of diphone and triphone pairs. This method is based on techniques provided by the evolutionary computation community, taking advantage of their robustness in noisy domains. The experiments and their analyses demonstrate its good performance in this problem, thus, overcoming some constraints assumed by previous works leading to a new interesting framework for further investigations.
|Number of pages
|Published - 2003
|8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
Duration: 1 Sept 2003 → 4 Sept 2003
|8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
|1/09/03 → 4/09/03