Prosodic analysis of storytelling discourse modes and narrative situations oriented to Text-to-Speech synthesis

Raúl Montaño, Francesc Alías, Josep Ferrer

Research output: Conference paperContributionpeer-review

19 Citations (Scopus)

Abstract

The generation of synthetic speech with a certain degree of expressiveness has been successful for some particular applications or speaking styles (e.g. emotions). In this context, there is a particular speaking style with subtle speech nuances that may be of great interest for delivering expressive speech: the storytelling style. The purpose of this paper is to define a first step towards developing a storytelling Text-to-Speech (TTS) synthesis system by means of modelling the specific prosodic patterns (pitch, intensity and tempo) of this speaking style. We base our analysis of a tale in Spanish on discourse modes present in storytelling: narrative, descriptive and dialogue. Moreover, we introduce narrative situations (neutral narrative, post-character, suspense and affective situations) within the narrative mode, which are analysed at the sentence level. After grouping the sentences into modes and narrative situations, we analyse their corresponding prosodic patterns both objectively (via statistical tests) and subjectively (via perceptual test considering resynthesized sentences). The results show that the statistically validated prosodic rules perform equally (or even better) than the original prosody in most sentences.

Original languageEnglish
Pages171-176
Number of pages6
Publication statusPublished - 2013
Event8th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2013 - Barcelona, Spain
Duration: 31 Aug 20132 Sept 2013

Conference

Conference8th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2013
Country/TerritorySpain
CityBarcelona
Period31/08/132/09/13

Keywords

  • Harmonic plus Noise Model
  • Storytelling
  • TTS
  • narrative situations
  • prosodic analysis

Fingerprint

Dive into the research topics of 'Prosodic analysis of storytelling discourse modes and narrative situations oriented to Text-to-Speech synthesis'. Together they form a unique fingerprint.

Cite this