Text clustering on latent thematic spaces: Variants, strengths and weaknesses

Producció científica: Capítol de llibreContribució a congrés/conferènciaAvaluat per experts

1 Citació (Scopus)

Resum

Deriving a thematically meaningful partition of an unlabeled text corpus is a challenging task. In comparison to classic term-based document indexing, the use of document representations based on latent thematic generative models can lead to improved clustering. However, determining a priori the optimal indexing technique is not straightforward, as it depends on the clustering problem faced and the partitioning strategy adopted. So as to overcome this indeterminacy, we propose deriving a consensus labeling upon the results of clustering processes executed on several document representations. Experiments conducted on subsets of two standard text corpora evaluate distinct clustering strategies based on latent thematic spaces and highlight the usefulness of consensus clustering to overcome the optimal document indexing indeterminacy.

Idioma originalAnglès
Títol de la publicacióIndependent Component Analysis and Signal Separation - 7th International Conference, ICA 2007, Proceedings
EditorSpringer Verlag
Pàgines794-801
Nombre de pàgines8
ISBN (imprès)9783540744931
DOIs
Estat de la publicacióPublicada - 2007
Esdeveniment7th International Conference on Independent Component Analysis (ICA) and Source Separation, ICA 2007 - London, United Kingdom
Durada: 9 de set. 200712 de set. 2007

Sèrie de publicacions

NomLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volum4666 LNCS
ISSN (imprès)0302-9743
ISSN (electrònic)1611-3349

Conferència

Conferència7th International Conference on Independent Component Analysis (ICA) and Source Separation, ICA 2007
País/TerritoriUnited Kingdom
CiutatLondon
Període9/09/0712/09/07

Fingerprint

Navegar pels temes de recerca de 'Text clustering on latent thematic spaces: Variants, strengths and weaknesses'. Junts formen un fingerprint únic.

Com citar-ho