TY - JOUR
T1 - Look, listen and find
T2 - A purely audiovisual approach to online videos geotagging
AU - Sevillano, Xavier
AU - Valero, Xavier
AU - Alías, Francesc
N1 - Publisher Copyright:
© 2014 Elsevier Inc.
PY - 2015/2/20
Y1 - 2015/2/20
N2 - Tagging videos with the geo-coordinates of the place where they were filmed (i.e. geotagging) enables indexing online multimedia repositories using geographical criteria. However, millions of non geotagged videos available online are invisible to the eyes of geo-oriented applications, which calls for the development of automatic techniques for estimating the location where a video was filmed. The most successful approaches to this problem largely rely on exploiting the textual metadata associated to the video, but it is quite common to encounter videos with no title, description nor tags. This work focuses on this latter adverse scenario and proposes a purely audiovisual approach to geotagging based on audiovisual similarity retrieval, modality fusion and cluster density. Using a subset of the MediaEval 2011 Placing task data set, we evaluate the ability of several visual and acoustic features for estimating the videos location both separately and jointly (via fusion at feature and at cluster level). The optimally configured version of the proposed system is capable of geotagging videos within 1 km of their real location at least 4 times more precisely than any of the audiovisual and visual content-based participants in the MediaEval 2011 Placing task.
AB - Tagging videos with the geo-coordinates of the place where they were filmed (i.e. geotagging) enables indexing online multimedia repositories using geographical criteria. However, millions of non geotagged videos available online are invisible to the eyes of geo-oriented applications, which calls for the development of automatic techniques for estimating the location where a video was filmed. The most successful approaches to this problem largely rely on exploiting the textual metadata associated to the video, but it is quite common to encounter videos with no title, description nor tags. This work focuses on this latter adverse scenario and proposes a purely audiovisual approach to geotagging based on audiovisual similarity retrieval, modality fusion and cluster density. Using a subset of the MediaEval 2011 Placing task data set, we evaluate the ability of several visual and acoustic features for estimating the videos location both separately and jointly (via fusion at feature and at cluster level). The optimally configured version of the proposed system is capable of geotagging videos within 1 km of their real location at least 4 times more precisely than any of the audiovisual and visual content-based participants in the MediaEval 2011 Placing task.
KW - Audiovisual indexing
KW - Cluster level fusion
KW - Clustering
KW - Feature fusion
KW - Geotagging
KW - Social media tagging
UR - http://www.scopus.com/inward/record.url?scp=84922176789&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2014.10.021
DO - 10.1016/j.ins.2014.10.021
M3 - Article
AN - SCOPUS:84922176789
SN - 0020-0255
VL - 295
SP - 558
EP - 572
JO - Information Sciences
JF - Information Sciences
ER -