Skip to main navigation Skip to search Skip to main content

Objective Viseme Extraction and Audiovisual Uncertainty: Estimation Limits between Auditory and Visual Modes

  • Javier Melenchón
  • , Jordi Simó
  • , Germán Cobo
  • , Elisa Martínez

    Research output: Conference paperContributionpeer-review

    5 Citations (Scopus)

    Abstract

    An objective way to obtain consonant visemes for any given Spanish speaking person is proposed. Its face is recorded while speaking a balanced set of sentences and stored as an audiovisual sequence. Visual and auditory modes are segmented by allophones and a distance matrix is built to find visually similar perceived allophones. Results show high correlation with tedious subjective earlier evaluations regardless of being in English. In addition, estimation between modes is also studied, revealing a tradeoff between performances in both modes: given a set of auditory groups and another of visual ones for each grouping criteria, increasing the estimation performance of one mode is translated to decreasing that of the other one. Moreover, the tradeoff is very similar (< 7% between maximum and minimum values) in all observed examples.

    Original languageEnglish
    Publication statusPublished - 2007
    Event2007 International Conference on Auditory-Visual Speech Processing, AVSP 2007 - Hilvarenbeek, Netherlands
    Duration: 31 Aug 20073 Sept 2007

    Conference

    Conference2007 International Conference on Auditory-Visual Speech Processing, AVSP 2007
    Country/TerritoryNetherlands
    CityHilvarenbeek
    Period31/08/073/09/07

    Keywords

    • Audiovisual processing
    • Auditory visual uncertainty
    • Viseme extraction

    Fingerprint

    Dive into the research topics of 'Objective Viseme Extraction and Audiovisual Uncertainty: Estimation Limits between Auditory and Visual Modes'. Together they form a unique fingerprint.

    Cite this