TY - JOUR
T1 - Emphatic visual speech synthesis
AU - Melenchón, Javier
AU - Martínez, Elisa
AU - Torre, Fernando De La
AU - Montero, José A.
N1 - Funding Information:
Manuscript received January 15, 2008; revised October 07, 2008. Current version published February 11, 2009. This work has was supported in part by the Spanish Ministry of Education and Science under Grant TEC2006-08043/TCM. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Helen Meng.
PY - 2009/3
Y1 - 2009/3
N2 - The synthesis of talking heads has been a flourishing, , research area over the last few years. Since human beings have, , an uncanny ability to read people's faces, most related applications, , (e.g., advertising, video-teleconferencing) require absolutely, , realistic photometric and behavioral synthesis of faces. This paper, , proposes a person-specific facial synthesis framework that allows, , high realism and includes a novel way to control visual emphasis, , (e.g., level of exaggeration of visible articulatory movements of the, , vocal tract). There are three main contributions: a geodesic interpolation, , with visual unit selection, a parameterization of visual emphasis, , , and the design of minimum size corpora. Perceptual tests, , with human subjects reveal high realism properties, achieving similar, , perceptual scores as real samples. Furthermore, the visual emphasis, , level and two communication styles show a statistical interaction, , relationship.
AB - The synthesis of talking heads has been a flourishing, , research area over the last few years. Since human beings have, , an uncanny ability to read people's faces, most related applications, , (e.g., advertising, video-teleconferencing) require absolutely, , realistic photometric and behavioral synthesis of faces. This paper, , proposes a person-specific facial synthesis framework that allows, , high realism and includes a novel way to control visual emphasis, , (e.g., level of exaggeration of visible articulatory movements of the, , vocal tract). There are three main contributions: a geodesic interpolation, , with visual unit selection, a parameterization of visual emphasis, , , and the design of minimum size corpora. Perceptual tests, , with human subjects reveal high realism properties, achieving similar, , perceptual scores as real samples. Furthermore, the visual emphasis, , level and two communication styles show a statistical interaction, , relationship.
KW - Audiovisual speech synthesis
KW - Emphatic visual-speech
KW - Talking head
UR - http://www.scopus.com/inward/record.url?scp=70350442426&partnerID=8YFLogxK
U2 - 10.1109/TASL.2008.2010213
DO - 10.1109/TASL.2008.2010213
M3 - Article
AN - SCOPUS:70350442426
SN - 1558-7916
VL - 17
SP - 459
EP - 468
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 3
ER -