Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts

Marc Arnela; Oriol Guasch

doi:10.1109/TASLP.2017.2735179

Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts

Marc Arnela, Oriol Guasch

Facultat Internacional de Comerç i Economia Digital La Salle

Producció científica: Article en revista indexada › Article › Avaluat per experts

14 Cites (Scopus)

Resum

Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.

Idioma original	Anglès
Pàgines (de-a)	2013-2023
Nombre de pàgines	11
Revista	IEEE/ACM Transactions on Audio Speech and Language Processing
Volum	25
Número	10
DOIs	https://doi.org/10.1109/TASLP.2017.2735179
Estat de la publicació	Publicada - d’oct. 2017

Accés al document

10.1109/TASLP.2017.2735179

Altres arxius i enllaços

Link to publication in Scopus

Com citar-ho

@article{933466e3c72549309c84b99f973788de,

title = "Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts",

abstract = "Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.",

keywords = "Diphthong synthesis, finite element method, speech synthesis, tuned two-dimensional vocal tracts, vocal tract acoustics",

author = "Marc Arnela and Oriol Guasch",

note = "Funding Information: Manuscript received January 16, 2017; revised June 8, 2017; accepted July 14, 2017. Date of publication August 2, 2017; date of current version September 8, 2017. This work was supported in part by the Agencia Estatal de Investigaci{\'o}n and FEDER, EU, through Project GENIOVOX TEC2016-81107-P, in part by the Generalitat de Catalunya and Universitat Ramon Llull, references 2016-URL-IR-010 and 2016-URL-IR-013, and in part by the Secretaria d{\textquoteright}Universitats i Re-cerca del Departament d{\textquoteright}Economia i Coneixement (Generalitat de Catalunya) under Grant 2014-SGR-0590. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Hsin-min Wang. (Corresponding author: Marc Arnela.) The authors are with the Grup de Recerca en Tecnologies M{\`e}dia, La Salle, Universitat Ramon Llull, Barcelona 08022, Spain (e-mail: marnela@salleurl. edu; oguasch@salleurl.edu). Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2017",

month = oct,

doi = "10.1109/TASLP.2017.2735179",

language = "English",

volume = "25",

pages = "2013--2023",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

number = "10",

}

TY - JOUR

T1 - Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts

AU - Arnela, Marc

AU - Guasch, Oriol

N1 - Funding Information: Manuscript received January 16, 2017; revised June 8, 2017; accepted July 14, 2017. Date of publication August 2, 2017; date of current version September 8, 2017. This work was supported in part by the Agencia Estatal de Investigación and FEDER, EU, through Project GENIOVOX TEC2016-81107-P, in part by the Generalitat de Catalunya and Universitat Ramon Llull, references 2016-URL-IR-010 and 2016-URL-IR-013, and in part by the Secretaria d’Universitats i Re-cerca del Departament d’Economia i Coneixement (Generalitat de Catalunya) under Grant 2014-SGR-0590. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Hsin-min Wang. (Corresponding author: Marc Arnela.) The authors are with the Grup de Recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona 08022, Spain (e-mail: marnela@salleurl. edu; oguasch@salleurl.edu). Publisher Copyright: © 2014 IEEE.

PY - 2017/10

Y1 - 2017/10

N2 - Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.

AB - Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.

KW - Diphthong synthesis

KW - finite element method

KW - speech synthesis

KW - tuned two-dimensional vocal tracts

KW - vocal tract acoustics

UR - http://www.scopus.com/inward/record.url?scp=85028955585&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2017.2735179

DO - 10.1109/TASLP.2017.2735179

M3 - Article

AN - SCOPUS:85028955585

SN - 2329-9290

VL - 25

SP - 2013

EP - 2023

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

IS - 10

ER -

Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts

Resum

Accés al document

Altres arxius i enllaços

Fingerprint

Com citar-ho