TY - JOUR
T1 - Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts
AU - Arnela, Marc
AU - Guasch, Oriol
N1 - Funding Information:
Manuscript received January 16, 2017; revised June 8, 2017; accepted July 14, 2017. Date of publication August 2, 2017; date of current version September 8, 2017. This work was supported in part by the Agencia Estatal de Investigación and FEDER, EU, through Project GENIOVOX TEC2016-81107-P, in part by the Generalitat de Catalunya and Universitat Ramon Llull, references 2016-URL-IR-010 and 2016-URL-IR-013, and in part by the Secretaria d’Universitats i Re-cerca del Departament d’Economia i Coneixement (Generalitat de Catalunya) under Grant 2014-SGR-0590. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Hsin-min Wang. (Corresponding author: Marc Arnela.) The authors are with the Grup de Recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, Barcelona 08022, Spain (e-mail: marnela@salleurl. edu; [email protected]).
Publisher Copyright:
© 2014 IEEE.
PY - 2017/10
Y1 - 2017/10
N2 - Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.
AB - Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.
KW - Diphthong synthesis
KW - finite element method
KW - speech synthesis
KW - tuned two-dimensional vocal tracts
KW - vocal tract acoustics
UR - http://www.scopus.com/inward/record.url?scp=85028955585&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2017.2735179
DO - 10.1109/TASLP.2017.2735179
M3 - Article
AN - SCOPUS:85028955585
SN - 2329-9290
VL - 25
SP - 2013
EP - 2023
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
IS - 10
ER -