Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems

Carlos Mougan, David Masip, Jordi Nin, Oriol Pujol

Producció científica: Capítol de llibreContribució a congrés/conferènciaAvaluat per experts

5 Cites (Scopus)

Resum

Regression problems have been widely studied in machine learning literature resulting in a plethora of regression models and performance measures. However, there are few techniques specially dedicated to solve the problem of how to incorporate categorical features to regression problems. Usually, categorical feature encoders are general enough to cover both classification and regression problems. This lack of specificity results in underperforming regression models. In this paper, we provide an in-depth analysis of how to tackle high cardinality categorical features with the quantile. Our proposal outperforms state-of-the-art encoders, including the traditional statistical mean target encoder, when considering the Mean Absolute Error, especially in the presence of long-tailed or skewed distributions. Besides, to deal with possible overfitting when there are categories with small support, our encoder benefits from additive smoothing. Finally, we describe how to expand the encoded values by creating a set of features with different quantiles. This expanded encoder provides a more informative output about the categorical feature in question, further boosting the performance of the regression model.

Idioma originalAnglès
Títol de la publicacióModeling Decisions for Artificial Intelligence - 18th International Conference, MDAI 2021, Proceedings
EditorsVicenç Torra, Yasuo Narukawa
EditorSpringer Science and Business Media Deutschland GmbH
Pàgines168-180
Nombre de pàgines13
ISBN (imprès)9783030855284
DOIs
Estat de la publicacióPublicada - 2021
Publicat externament
Esdeveniment18th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2021 - Virtual, Online
Durada: 27 de set. 202130 de set. 2021

Sèrie de publicacions

NomLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volum12898 LNAI
ISSN (imprès)0302-9743
ISSN (electrònic)1611-3349

Conferència

Conferència18th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2021
CiutatVirtual, Online
Període27/09/2130/09/21

Fingerprint

Navegar pels temes de recerca de 'Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems'. Junts formen un fingerprint únic.

Com citar-ho