Adding singing capabilities to unit selection TTS through HNM-based conversion

Research output: Book chapterConference contributionpeer-review

1 Citation (Scopus)

Abstract

Adding singing capabilities to a corpus-based concatenative text-to-speech (TTS) system can be addressed by explicitly collecting singing samples from the previously recorded speaker. However, this approach is only feasible if the considered speaker is also a singing talent. As an alternative, we consider appending a Harmonic plus Noise Model (HNM) speech-to-singing conversion module to a Unit Selection TTS (US-TTS) system. Two possible text-to-speech-to-singing synthesis approaches are studied: applying the speech-to-singing conversion to the US-TTS synthetic output, or implementing a hybrid US+HNM synthesis framework. The perceptual tests show that the speech-to-singing conversion yields similar singing resemblance than the natural version, but with lower naturalness. Moreover, no statistically significant differences are found between both strategies in terms of naturalness nor singing resemblance. Finally, the hybrid approach allows reducing more than twice the overall computational cost.

Original languageEnglish
Title of host publicationAdvances in Speech and Language Technologies for Iberian Languages - 3rd International Conference, IberSPEECH 2016, Proceedings
EditorsCarmen Garcia Mateo, Alfonso Ortega, Alberto Abad, Nuno Mamede, Carlos D. Martínez Hinarejos, Antonio Teixeira, Fernando Batista, Fernando Perdigao
PublisherSpringer Verlag
Pages33-43
Number of pages11
ISBN (Print)9783319491684
DOIs
Publication statusPublished - 2016
Event3rd International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2016 - PRT, Turkey
Duration: 23 Nov 201625 Nov 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10077 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Conference on Advances in Speech and Language Technologies for Iberian Languages, IberSPEECH 2016
Country/TerritoryTurkey
CityPRT
Period23/11/1625/11/16

Keywords

  • Harmonic plus noise model
  • Prosody modification
  • Speech-to-singing
  • Text-to-singing
  • Unit-selection TTS

Fingerprint

Dive into the research topics of 'Adding singing capabilities to unit selection TTS through HNM-based conversion'. Together they form a unique fingerprint.

Cite this