TY - JOUR
T1 - Author profiling using corpus statistics, lexicons and stylistic features
T2 - 2013 Cross Language Evaluation Forum Conference, CLEF 2013
AU - De-Arteaga, Maria
AU - Jimenez, Sergio
AU - Dueñas, George
AU - Mancera, Sergio
AU - Baquero, Julia
N1 - Copyright © 2013
PY - 2013/9
Y1 - 2013/9
N2 - This paper describes our participation in the 9th PAN evaluation lab in the author profiling task. The proposed approach relies on the extraction of stylistic, lexicon and corpus-based features, which were combined with a logistic classifier. These three sets of features contain pairwise intersections and even some features that belong to all categories. A comprehensive comparison of the contribution of several feature subsets is presented. In particular, a set of features based on Bayesian inference provided the most important contribution. We developed our system in the Spanish training corpus, once developed it was used, with minor changes, for the English documents, too. The proposed system was ranked 6th in the official ranking for Spanish documents among 17 submitted systems. This result shows that our approach is meaningful and competitive for predicting demographics from text.
AB - This paper describes our participation in the 9th PAN evaluation lab in the author profiling task. The proposed approach relies on the extraction of stylistic, lexicon and corpus-based features, which were combined with a logistic classifier. These three sets of features contain pairwise intersections and even some features that belong to all categories. A comprehensive comparison of the contribution of several feature subsets is presented. In particular, a set of features based on Bayesian inference provided the most important contribution. We developed our system in the Spanish training corpus, once developed it was used, with minor changes, for the English documents, too. The proposed system was ranked 6th in the official ranking for Spanish documents among 17 submitted systems. This result shows that our approach is meaningful and competitive for predicting demographics from text.
KW - Age prediction
KW - Author profiling
KW - Gender prediction
UR - https://www.scopus.com/pages/publications/84922041546
M3 - Conference article
AN - SCOPUS:84922041546
SN - 1613-0073
VL - 1179
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 23 September 2013 through 26 September 2013
ER -