Author profiling using corpus statistics, lexicons and stylistic features: Notebook for PAN at CLEF-2013

Maria De-Arteaga, Sergio Jimenez, George Dueñas, Sergio Mancera, Julia Baquero

Producció científica: Article en revista indexadaArticle de conferènciaAvaluat per experts

4 Cites (Scopus)

Resum

This paper describes our participation in the 9th PAN evaluation lab in the author profiling task. The proposed approach relies on the extraction of stylistic, lexicon and corpus-based features, which were combined with a logistic classifier. These three sets of features contain pairwise intersections and even some features that belong to all categories. A comprehensive comparison of the contribution of several feature subsets is presented. In particular, a set of features based on Bayesian inference provided the most important contribution. We developed our system in the Spanish training corpus, once developed it was used, with minor changes, for the English documents, too. The proposed system was ranked 6th in the official ranking for Spanish documents among 17 submitted systems. This result shows that our approach is meaningful and competitive for predicting demographics from text.

Idioma originalAnglès
Nombre de pàgines9
RevistaCEUR Workshop Proceedings
Volum1179
Estat de la publicacióPublicada - de set. 2013
Publicat externament
Esdeveniment2013 Cross Language Evaluation Forum Conference, CLEF 2013 - Valencia, Spain
Durada: 23 de set. 201326 de set. 2013

Fingerprint

Navegar pels temes de recerca de 'Author profiling using corpus statistics, lexicons and stylistic features: Notebook for PAN at CLEF-2013'. Junts formen un fingerprint únic.

Com citar-ho