TY - GEN
T1 - Tweet-SCAN
T2 - 18th International Conference of the Catalan Association for Artificial Intelligence, CCIA 2015
AU - Capdevila, Joan
AU - Cerquides, Jesús
AU - Nin, J.
AU - Torres, Jordi
N1 - Publisher Copyright:
© 2015 The authors and IOS Press. All rights reserved..
PY - 2015
Y1 - 2015
N2 - Twitter has become one of the most popular Location-Based Social Networks (LBSNs) that enables bridging physical and virtual worlds. Tweets, 140-character-long messages published in Twitter, are aimed to provide basic responses to the What's happening? question. Occurrences and events in the real life are usually reported through geo-located tweets by users on site. Uncovering event-related tweets from the rest is a challenging problem that necessarily requires exploiting different tweet features. With that in mind, we propose Tweet-SCAN, a novel event discovery technique based on the density-based clustering algorithm called DB-SCAN. Tweet-SCAN takes into account four main features from a tweet, namely content, time, location and user to cluster homogeneously event-related tweets. This new technique models textual content through a probabilistic topic model called Hierarchical Dirichlet Process and introduces Jensen-Shannon distance for the task of neighborhood identification in the textual dimension. As a matter of fact, we show Tweet-SCAN performance in a real data set of geo-located tweets posted during Barcelona local festivities in 2014, for which some of the events were known beforehand. By means of this data set, we are able to assess Tweet-SCAN capabilities to discover events, justify using a textual component and highlight the effects of several parameters.
AB - Twitter has become one of the most popular Location-Based Social Networks (LBSNs) that enables bridging physical and virtual worlds. Tweets, 140-character-long messages published in Twitter, are aimed to provide basic responses to the What's happening? question. Occurrences and events in the real life are usually reported through geo-located tweets by users on site. Uncovering event-related tweets from the rest is a challenging problem that necessarily requires exploiting different tweet features. With that in mind, we propose Tweet-SCAN, a novel event discovery technique based on the density-based clustering algorithm called DB-SCAN. Tweet-SCAN takes into account four main features from a tweet, namely content, time, location and user to cluster homogeneously event-related tweets. This new technique models textual content through a probabilistic topic model called Hierarchical Dirichlet Process and introduces Jensen-Shannon distance for the task of neighborhood identification in the textual dimension. As a matter of fact, we show Tweet-SCAN performance in a real data set of geo-located tweets posted during Barcelona local festivities in 2014, for which some of the events were known beforehand. By means of this data set, we are able to assess Tweet-SCAN capabilities to discover events, justify using a textual component and highlight the effects of several parameters.
KW - DBSCAN
KW - Hierarchical Dirichlet Process (HDP)
KW - Jensen-Shannon Distance (JSD)
KW - Twitter
KW - Unsupervised learning
KW - event discovery
KW - probabilistic topic models
UR - http://www.scopus.com/inward/record.url?scp=84948769596&partnerID=8YFLogxK
U2 - 10.3233/978-1-61499-578-4-110
DO - 10.3233/978-1-61499-578-4-110
M3 - Conference contribution
AN - SCOPUS:84948769596
T3 - Frontiers in Artificial Intelligence and Applications
SP - 110
EP - 119
BT - Artificial Intelligence Research and Development - Proceedings of the 18th International Conference of the Catalan Association for Artificial Intelligence
A2 - Boixader, Dionis
A2 - Grimaldo, Francisco
A2 - Armengol, Eva
PB - IOS Press
Y2 - 21 October 2015 through 23 October 2015
ER -