TY - JOUR
T1 - Formant Frequency Tuning of Three-Dimensional MRI-Based Vocal Tracts for the Finite Element Synthesis of Vowels
AU - Arnela, Marc
AU - Guasch, Oriol
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Nowadays it is possible to acquire three-dimensional (3D) geometries of the vocal tract by means of Magnetic Resonance Imaging (MRI) and introduce them into a 3D acoustic model to obtain their frequency response. However, it is not yet feasible to consider small variations in these geometries to tune the formant frequencies and produce, for instance, specific voice or singing effects. This work presents a methodology for doing so. First, the 3D MRI-based vocal tract geometry is discretized into a set of cross-sections from which its 1D area function is calculated. A 1D tuning algorithm based on sensitivity functions is then used to iteratively perturb the area function until the desired target frequencies are obtained. The algorithm also allows to consider vocal tract length perturbations between cross-sections. Finally, a 3D vocal tract is reconstructed using the shape of the original cross-sections with the new areas, and the new spacing between cross-sections according to the computed length variations. In this way it is possible to produce voice and singing effects by tuning the first formants of vowels while keeping the high energy content of the spectrum of 3D models at a low computational cost. Several examples are presented, ranging from the shift to lower and higher frequencies of the first formant (F1) to the generation of formant clusters. The latter is the case of the singing formant, in which F3, F4 and F5 are grouped together, or the case of biphonic singing, where F2 and F3 form a single peak to generate an overtone above the fundamental frequency.
AB - Nowadays it is possible to acquire three-dimensional (3D) geometries of the vocal tract by means of Magnetic Resonance Imaging (MRI) and introduce them into a 3D acoustic model to obtain their frequency response. However, it is not yet feasible to consider small variations in these geometries to tune the formant frequencies and produce, for instance, specific voice or singing effects. This work presents a methodology for doing so. First, the 3D MRI-based vocal tract geometry is discretized into a set of cross-sections from which its 1D area function is calculated. A 1D tuning algorithm based on sensitivity functions is then used to iteratively perturb the area function until the desired target frequencies are obtained. The algorithm also allows to consider vocal tract length perturbations between cross-sections. Finally, a 3D vocal tract is reconstructed using the shape of the original cross-sections with the new areas, and the new spacing between cross-sections according to the computed length variations. In this way it is possible to produce voice and singing effects by tuning the first formants of vowels while keeping the high energy content of the spectrum of 3D models at a low computational cost. Several examples are presented, ranging from the shift to lower and higher frequencies of the first formant (F1) to the generation of formant clusters. The latter is the case of the singing formant, in which F3, F4 and F5 are grouped together, or the case of biphonic singing, where F2 and F3 form a single peak to generate an overtone above the fundamental frequency.
KW - 3D vocal tract
KW - Finite Element Method
KW - formant tuning
KW - MRI
KW - vocal tract acoustics
KW - vowels
UR - http://www.scopus.com/inward/record.url?scp=85194057652&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2024.3402068
DO - 10.1109/TASLP.2024.3402068
M3 - Article
AN - SCOPUS:85194057652
SN - 2329-9290
VL - 32
SP - 2790
EP - 2799
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -