TY - GEN

T1 - Partial symbol ordering distance

AU - Herranz, Javier

AU - Nin, Jordi

PY - 2009

Y1 - 2009

N2 - Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those scenarios where sequence similarity is related to the positions occupied by the symbols.

AB - Nowadays sequences of symbols are becoming more important, as they are the standard format for representing information in a large variety of domains such as ontologies, sequential patterns or non numerical attributes in databases. Therefore, the development of new distances for this kind of data is a crucial need. Recently, many similarity functions have been proposed for managing sequences of symbols; however, such functions do not always hold the triangular inequality. This property is a mandatory requirement in many data mining algorithms like clustering or k-nearest neighbors algorithms, where the presence of a metric space is a must. In this paper, we propose a new distance for sequences of (non-repeated) symbols based on the partial distances between the positions of the common symbols. We prove that this Partial Symbol Ordering distance satisfies the triangular inequality property, and we finally describe a set of experiments supporting that the new distance outperforms the Edit distance in those scenarios where sequence similarity is related to the positions occupied by the symbols.

KW - Distances

KW - Sequences of symbols

KW - Triangular inequality

UR - http://www.scopus.com/inward/record.url?scp=84886536872&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-04820-3_27

DO - 10.1007/978-3-642-04820-3_27

M3 - Conference contribution

AN - SCOPUS:84886536872

SN - 3642048196

SN - 9783642048197

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 293

EP - 302

BT - Modeling Decisions for Artificial Intelligence - 6th International Conference, MDAI 2009, Proceedings

T2 - 6th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2009

Y2 - 30 November 2009 through 2 December 2009

ER -