Three-dimensional (3-D) vocal tract acoustic modeling has the potential to generate high quality and natural voice sounds, but at the price of a large computational cost. Alternatively, 2-D models based on tuned vocal tracts have shown to provide similar results to the 3-D ones but with less computational demands. However, they are currently limited to the synthesis of static vowel sounds. In this paper, the tuned 2-D approach is extended by considering moving vocal tracts to generate dynamic vowel sounds, like diphthongs. Four tuning steps are followed to build a dynamic 2-D vocal tract model that can recover, to a large extent, the formant locations, bandwidths, and energies of a 3-D vocal tract with circular cross section, set in a spherical baffle representing the human head. Acoustic waves propagating through the time evolving vocal tract and radiating to free-field are simulated using the finite element method in the time-domain. As examples, the diphthongs [α i] and [αu] have been generated using the tuning approach and compared, by means of objective and subjective evaluations, to those resulting from 3-D and conventional 2-D simulations.
|Nombre de pàgines||11|
|Revista||IEEE/ACM Transactions on Audio Speech and Language Processing|
|Estat de la publicació||Publicada - d’oct. 2017|