TY - JOUR
T1 - Tweet-SCAN
T2 - An event discovery technique for geo-located tweets
AU - Capdevila, Joan
AU - Cerquides, Jesús
AU - Nin, J.
AU - Torres, Jordi
N1 - Funding Information:
This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract TIN2015-65316, by BSC-CNS Severo Ochoa programs (SEV2015-0493 and SEV-2011-00067), by the SGR program (2014-SGR-1051) of the Catalan Government and by COR (TIN2012-38876-C02-01) project. We would like to also acknowledge reviewers for their constructive feedback.
Funding Information:
This work is partially supported by Obra Social ?la Caixa?, by the Spanish Ministry of Science and Innovation under contract TIN2015-65316, by BSC-CNS Severo Ochoa programs (SEV2015-0493 and SEV-2011-00067), by the SGR program (2014-SGR-1051) of the Catalan Government and by COR (TIN2012-38876-C02-01) project. We would like to also acknowledge reviewers for their constructive feedback.
Publisher Copyright:
© 2016 Elsevier B.V.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Twitter has become one of the most popular Location-based Social Networks (LBSNs) that bridges physical and virtual worlds. Tweets, 140-character-long messages, are aimed to give answer to the What's happening? question. Occurrences and events in the real life (such as political protests, music concerts, natural disasters or terrorist acts) are usually reported through geo-located tweets by users on site. Uncovering event-related tweets from the rest is a challenging problem that necessarily requires exploiting different tweet features. With that in mind, we propose Tweet-SCAN, a novel event discovery technique based on the popular density-based clustering algorithm called DBSCAN. Tweet-SCAN takes into account four main features from a tweet, namely content, time, location and user to group together event-related tweets. The proposed technique models textual content through a probabilistic topic model called Hierarchical Dirichlet Process and introduces Jensen–Shannon distance for the task of neighborhood identification in the textual dimension. As a matter of fact, we show Tweet-SCAN performance in two real data sets of geo-located tweets posted during Barcelona local festivities in 2014 and 2015, for which some of the events were identified by domain experts beforehand. Through these tagged data sets, we are able to assess Tweet-SCAN capabilities to discover events, justify using a textual component and highlight the effects of several parameters.
AB - Twitter has become one of the most popular Location-based Social Networks (LBSNs) that bridges physical and virtual worlds. Tweets, 140-character-long messages, are aimed to give answer to the What's happening? question. Occurrences and events in the real life (such as political protests, music concerts, natural disasters or terrorist acts) are usually reported through geo-located tweets by users on site. Uncovering event-related tweets from the rest is a challenging problem that necessarily requires exploiting different tweet features. With that in mind, we propose Tweet-SCAN, a novel event discovery technique based on the popular density-based clustering algorithm called DBSCAN. Tweet-SCAN takes into account four main features from a tweet, namely content, time, location and user to group together event-related tweets. The proposed technique models textual content through a probabilistic topic model called Hierarchical Dirichlet Process and introduces Jensen–Shannon distance for the task of neighborhood identification in the textual dimension. As a matter of fact, we show Tweet-SCAN performance in two real data sets of geo-located tweets posted during Barcelona local festivities in 2014 and 2015, for which some of the events were identified by domain experts beforehand. Through these tagged data sets, we are able to assess Tweet-SCAN capabilities to discover events, justify using a textual component and highlight the effects of several parameters.
KW - DBSCAN
KW - Event discovery
KW - Hierarchical Dirichlet Process (HDP)
KW - Probabilistic topic models
KW - Twitter
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=84995427101&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2016.08.010
DO - 10.1016/j.patrec.2016.08.010
M3 - Article
AN - SCOPUS:84995427101
SN - 0167-8655
VL - 93
SP - 58
EP - 68
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -