Contribution of Vocal Tract and Glottal Source Spectral Cues in the Generation of Acted Happy and Aggressive Spanish Vowels

Marc Freixes; Joan Claudi Socoró; Francesc Alías

doi:10.3390/app12042055

Contribution of Vocal Tract and Glottal Source Spectral Cues in the Generation of Acted Happy and Aggressive Spanish Vowels

Marc Freixes, Joan Claudi Socoró, Francesc Alías

Producció científica: Article en revista indexada › Article › Avaluat per experts

1 Citació (Scopus)

Resum

The source-filter model is one of the main techniques applied to speech analysis and syn-thesis. Recent advances in voice production by means of three-dimensional (3D) source-filter models have overcome several limitations of classic one-dimensional techniques. Despite the development of preliminary attempts to improve the expressiveness of 3D-generated voices, they are still far from achieving realistic results. Towards this goal, this work analyses the contribution of both the the vocal tract (VT) and the glottal source spectral (GSS) cues in the generation of happy and aggressive speech through a GlottDNN-based analysis-by-synthesis methodology. Paired neutral expressive utterances are parameterised to generate different combinations of expressive vowels, applying the target expressive GSS and/or VT cues on the neutral vowels after transplanting the expressive prosody on these utterances. The conducted objective tests focused on Spanish [a], [i] and [u] vowels show that both GSS and VT cues significantly reduce the spectral distance to the expressive target. The results from the perceptual test show that VT cues make a statistically significant contribution in the expression of happy and aggressive emotions for [a] vowels, while the GSS contribution is significant in [i] and [u] vowels.

Idioma original	Anglès
Número d’article	2055
Revista	Applied Sciences (Switzerland)
Volum	12
Número	4
DOIs	https://doi.org/10.3390/app12042055
Estat de la publicació	Publicada - 1 de febr. 2022

Accés al document

10.3390/app12042055

Altres arxius i enllaços

Link to publication in Scopus

Com citar-ho

@article{5f9fbb2ce0754099b913ed6087ebcba3,

title = "Contribution of Vocal Tract and Glottal Source Spectral Cues in the Generation of Acted Happy and Aggressive Spanish Vowels",

abstract = "The source-filter model is one of the main techniques applied to speech analysis and syn-thesis. Recent advances in voice production by means of three-dimensional (3D) source-filter models have overcome several limitations of classic one-dimensional techniques. Despite the development of preliminary attempts to improve the expressiveness of 3D-generated voices, they are still far from achieving realistic results. Towards this goal, this work analyses the contribution of both the the vocal tract (VT) and the glottal source spectral (GSS) cues in the generation of happy and aggressive speech through a GlottDNN-based analysis-by-synthesis methodology. Paired neutral expressive utterances are parameterised to generate different combinations of expressive vowels, applying the target expressive GSS and/or VT cues on the neutral vowels after transplanting the expressive prosody on these utterances. The conducted objective tests focused on Spanish [a], [i] and [u] vowels show that both GSS and VT cues significantly reduce the spectral distance to the expressive target. The results from the perceptual test show that VT cues make a statistically significant contribution in the expression of happy and aggressive emotions for [a] vowels, while the GSS contribution is significant in [i] and [u] vowels.",

keywords = "Emotional database, Expressive speech synthesis, GlottDNN, Glottal source, Inverse filtering, Numerical voice production, Speech analysis, Vocal tract",

author = "Marc Freixes and Socor{\'o}, {Joan Claudi} and Francesc Al{\'i}as",

note = "Funding Information: Funding: This research has been partially funded by the Agencia Estatal de Investigaci{\'o}n (AEI) through the FEMVoQ project (PID2020-120441GB-I00/AEI/10.13039/501100011033). Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = feb,

day = "1",

doi = "10.3390/app12042055",

language = "English",

volume = "12",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "4",

}

TY - JOUR

T1 - Contribution of Vocal Tract and Glottal Source Spectral Cues in the Generation of Acted Happy and Aggressive Spanish Vowels

AU - Freixes, Marc

AU - Socoró, Joan Claudi

AU - Alías, Francesc

N1 - Funding Information: Funding: This research has been partially funded by the Agencia Estatal de Investigación (AEI) through the FEMVoQ project (PID2020-120441GB-I00/AEI/10.13039/501100011033). Publisher Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland.

PY - 2022/2/1

Y1 - 2022/2/1

N2 - The source-filter model is one of the main techniques applied to speech analysis and syn-thesis. Recent advances in voice production by means of three-dimensional (3D) source-filter models have overcome several limitations of classic one-dimensional techniques. Despite the development of preliminary attempts to improve the expressiveness of 3D-generated voices, they are still far from achieving realistic results. Towards this goal, this work analyses the contribution of both the the vocal tract (VT) and the glottal source spectral (GSS) cues in the generation of happy and aggressive speech through a GlottDNN-based analysis-by-synthesis methodology. Paired neutral expressive utterances are parameterised to generate different combinations of expressive vowels, applying the target expressive GSS and/or VT cues on the neutral vowels after transplanting the expressive prosody on these utterances. The conducted objective tests focused on Spanish [a], [i] and [u] vowels show that both GSS and VT cues significantly reduce the spectral distance to the expressive target. The results from the perceptual test show that VT cues make a statistically significant contribution in the expression of happy and aggressive emotions for [a] vowels, while the GSS contribution is significant in [i] and [u] vowels.

AB - The source-filter model is one of the main techniques applied to speech analysis and syn-thesis. Recent advances in voice production by means of three-dimensional (3D) source-filter models have overcome several limitations of classic one-dimensional techniques. Despite the development of preliminary attempts to improve the expressiveness of 3D-generated voices, they are still far from achieving realistic results. Towards this goal, this work analyses the contribution of both the the vocal tract (VT) and the glottal source spectral (GSS) cues in the generation of happy and aggressive speech through a GlottDNN-based analysis-by-synthesis methodology. Paired neutral expressive utterances are parameterised to generate different combinations of expressive vowels, applying the target expressive GSS and/or VT cues on the neutral vowels after transplanting the expressive prosody on these utterances. The conducted objective tests focused on Spanish [a], [i] and [u] vowels show that both GSS and VT cues significantly reduce the spectral distance to the expressive target. The results from the perceptual test show that VT cues make a statistically significant contribution in the expression of happy and aggressive emotions for [a] vowels, while the GSS contribution is significant in [i] and [u] vowels.

KW - Emotional database

KW - Expressive speech synthesis

KW - GlottDNN

KW - Glottal source

KW - Inverse filtering

KW - Numerical voice production

KW - Speech analysis

KW - Vocal tract

UR - http://www.scopus.com/inward/record.url?scp=85124723928&partnerID=8YFLogxK

U2 - 10.3390/app12042055

DO - 10.3390/app12042055

M3 - Article

AN - SCOPUS:85124723928

SN - 2076-3417

VL - 12

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 4

M1 - 2055

ER -

Contribution of Vocal Tract and Glottal Source Spectral Cues in the Generation of Acted Happy and Aggressive Spanish Vowels

Resum

Accés al document

Altres arxius i enllaços

Fingerprint

Com citar-ho