TY - GEN
T1 - Partial symbol ordering distance
AU - Herranz, Javier
AU - Nin, J.
PY - 2009
Y1 - 2009
N2 - Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those scenarios where sequence similarity is related to the positions occupied by the symbols.
AB - Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those scenarios where sequence similarity is related to the positions occupied by the symbols.
KW - Distances
KW - Sequences of symbols
KW - Triangular inequality
UR - http://www.scopus.com/inward/record.url?scp=84886536872&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04820-3_27
DO - 10.1007/978-3-642-04820-3_27
M3 - Conference contribution
AN - SCOPUS:84886536872
SN - 3642048196
SN - 9783642048197
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 293
EP - 302
BT - Modeling Decisions for Artificial Intelligence - 6th International Conference, MDAI 2009, Proceedings
T2 - 6th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2009
Y2 - 30 November 2009 through 2 December 2009
ER -