TY - JOUR
T1 - Parallel hierarchical architectures for efficient consensus clustering on big multimedia cluster ensembles
AU - Sevillano, Xavier
AU - Socoró, Joan Claudi
AU - Alías, Francesc
N1 - Publisher Copyright:
© 2019 Elsevier Inc.
PY - 2020/2
Y1 - 2020/2
N2 - Consensus clustering is a useful tool for robust or distributed clustering applications. However, given the fact that time complexities of the consensus functions scale linearly or quadratically with the number of combined clusterings, execution can be slow or even impossible when operating on big cluster ensembles, a situation encountered when we pursue robust multimedia data clustering. This work introduces hierarchical consensus architectures, an inherently parallel approach based on the divide-and-conquer strategy for computationally efficient consensus clustering, in a bid to make faster, more effective consensus clustering possible in big multimedia cluster ensemble scenarios. Moreover, we define a specific implementation of hierarchical architectures, including a theoretical analysis of its fully parallel implementation computational complexity. In experiments conducted on unimodal and multimedia data sets involving small and big cluster ensembles, we find parallel hierarchical consensus architectures variants perform faster than traditional flat consensus in 75% of the experiments on small cluster ensembles, a percentage that rises to 100% on unimodal and multimedia big cluster ensembles, achieving an average speedup ratio of 30.5. Moreover, depending on the consensus function employed, the quality of the obtained consensus partitions ensures robust clustering results.
AB - Consensus clustering is a useful tool for robust or distributed clustering applications. However, given the fact that time complexities of the consensus functions scale linearly or quadratically with the number of combined clusterings, execution can be slow or even impossible when operating on big cluster ensembles, a situation encountered when we pursue robust multimedia data clustering. This work introduces hierarchical consensus architectures, an inherently parallel approach based on the divide-and-conquer strategy for computationally efficient consensus clustering, in a bid to make faster, more effective consensus clustering possible in big multimedia cluster ensemble scenarios. Moreover, we define a specific implementation of hierarchical architectures, including a theoretical analysis of its fully parallel implementation computational complexity. In experiments conducted on unimodal and multimedia data sets involving small and big cluster ensembles, we find parallel hierarchical consensus architectures variants perform faster than traditional flat consensus in 75% of the experiments on small cluster ensembles, a percentage that rises to 100% on unimodal and multimedia big cluster ensembles, achieving an average speedup ratio of 30.5. Moreover, depending on the consensus function employed, the quality of the obtained consensus partitions ensures robust clustering results.
KW - Big cluster ensembles
KW - Consensus clustering
KW - Divide-and-conquer
KW - Multimedia clustering
KW - Parallelization
UR - http://www.scopus.com/inward/record.url?scp=85072829850&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2019.09.064
DO - 10.1016/j.ins.2019.09.064
M3 - Article
AN - SCOPUS:85072829850
SN - 0020-0255
VL - 511
SP - 212
EP - 228
JO - Information Sciences
JF - Information Sciences
ER -