TY - JOUR
T1 - A modification of the k-means method for quasi-unsupervised learning
AU - Rebollo-Monedero, David
AU - Solé, Marc
AU - Nin, Jordi
AU - Forné, Jordi
N1 - Funding Information:
This work was partly supported by the Spanish Government through projects Consolider Ingenio 2010 CSD2007-00004 “ARES”, TEC2010-20572-C02-02 “Consequence” and by the Government of Catalonia under Grant 2009 SGR 1362 . D. Rebollo-Monedero is the recipient of a Juan de la Cierva postdoctoral fellowship, JCI-2009-05259, from the Spanish Ministry of Science and Innovation.
PY - 2013/1
Y1 - 2013/1
N2 - Since the advent of data clustering, the original formulation of the clustering problem has been enriched to incorporate a number of twists to widen its range of application. In particular, recent heuristic approaches have proposed to incorporate restrictions on the size of the clusters, while striving to minimize a measure of dissimilarity within them. Such size constraints effectively constitute a way to exploit prior knowledge, readily available in many scenarios, which can lead to an improved performance in the clustering obtained. In this paper, we build upon a modification of the celebrated k-means method resorting to a similar alternating optimization procedure, endowed with additive partition weights controlling the size of the partitions formed, adjusted by means of the Levenberg-Marquardt algorithm. We propose several further variations on this modification, in which different kinds of additional information are present. We report experimental results on various standardized datasets, demonstrating that our approaches outperform existing heuristics for size-constrained clustering. The running-time complexity of our proposal is assessed experimentally by means of a power-law regression analysis.
AB - Since the advent of data clustering, the original formulation of the clustering problem has been enriched to incorporate a number of twists to widen its range of application. In particular, recent heuristic approaches have proposed to incorporate restrictions on the size of the clusters, while striving to minimize a measure of dissimilarity within them. Such size constraints effectively constitute a way to exploit prior knowledge, readily available in many scenarios, which can lead to an improved performance in the clustering obtained. In this paper, we build upon a modification of the celebrated k-means method resorting to a similar alternating optimization procedure, endowed with additive partition weights controlling the size of the partitions formed, adjusted by means of the Levenberg-Marquardt algorithm. We propose several further variations on this modification, in which different kinds of additional information are present. We report experimental results on various standardized datasets, demonstrating that our approaches outperform existing heuristics for size-constrained clustering. The running-time complexity of our proposal is assessed experimentally by means of a power-law regression analysis.
KW - Clustering
KW - Constrained clustering
KW - Loyd algorithm
KW - Quasi-unsupervised learning
KW - Size constraints
KW - k-Means method
UR - http://www.scopus.com/inward/record.url?scp=84870054852&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2012.07.024
DO - 10.1016/j.knosys.2012.07.024
M3 - Article
AN - SCOPUS:84870054852
SN - 0950-7051
VL - 37
SP - 176
EP - 185
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -