TY - JOUR
T1 - Gesture synthesis adapted to speech emphasis
AU - Fernández-Baena, Adso
AU - Montaño, Raúl
AU - Antonijoan, Marc
AU - Roversi, Arturo
AU - Miralles, David
AU - Alías, Francesc
N1 - Funding Information:
We thank Eduard Ruesga and Meritxell Aragonès for their work on the acquisition and processing of motion capture data. We also want to thank Dani Arguedas for his labor as actor. We acknowledge Anna Fusté for assisting in programming. This work was supported by the CENIT program number CEN-20101019 , Granted by the Ministry of Science and Innovation of Spain.
PY - 2014
Y1 - 2014
N2 - Avatars communicate through speech and gestures to appear realistic and to enhance interaction with humans. In this context, several works have analyzed the relationship between speech and gestures, while others have been focused on their synthesis, following different approaches. In this work, we address both goals by linking speech to gestures in terms of time and intensity, to then use this knowledge to drive a gesture synthesizer from a manually annotated speech signal. To that effect, we define strength indicators for speech and motion. After validating them through perceptual tests, we obtain an intensity rule from their correlation. Moreover, we derive a synchrony rule to determine temporal correspondences between speech and gestures. These analyses have been conducted on aggressive and neutral performances to cover a broad range of emphatic levels, whose speech signal and motion have been manually annotated. Next, intensity and synchrony rules are used to drive a gesture synthesizer called gesture motion graph (GMG). These rules are validated by users from GMG output animations through perceptual tests. Results show that animations using intensity and synchrony rules perform better than those only using the synchrony rule (which in turn enhance realism with respect to random animation). Finally, we conclude that the extracted rules allow GMG to properly synthesize gestures adapted to speech emphasis from annotated speech.
AB - Avatars communicate through speech and gestures to appear realistic and to enhance interaction with humans. In this context, several works have analyzed the relationship between speech and gestures, while others have been focused on their synthesis, following different approaches. In this work, we address both goals by linking speech to gestures in terms of time and intensity, to then use this knowledge to drive a gesture synthesizer from a manually annotated speech signal. To that effect, we define strength indicators for speech and motion. After validating them through perceptual tests, we obtain an intensity rule from their correlation. Moreover, we derive a synchrony rule to determine temporal correspondences between speech and gestures. These analyses have been conducted on aggressive and neutral performances to cover a broad range of emphatic levels, whose speech signal and motion have been manually annotated. Next, intensity and synchrony rules are used to drive a gesture synthesizer called gesture motion graph (GMG). These rules are validated by users from GMG output animations through perceptual tests. Results show that animations using intensity and synchrony rules perform better than those only using the synchrony rule (which in turn enhance realism with respect to random animation). Finally, we conclude that the extracted rules allow GMG to properly synthesize gestures adapted to speech emphasis from annotated speech.
KW - Body language
KW - Character animation
KW - Human computer interaction
KW - Motion capture
KW - Motion graphs
KW - Speech analysis
KW - Speech emphasis
UR - http://www.scopus.com/inward/record.url?scp=84888308215&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2013.06.005
DO - 10.1016/j.specom.2013.06.005
M3 - Article
AN - SCOPUS:84888308215
SN - 0167-6393
VL - 57
SP - 331
EP - 350
JO - Speech Communication
JF - Speech Communication
ER -