TY - GEN
T1 - How XCS deals with rarities in domains with continuous attributes
AU - Orriols-Puig, Albert
AU - Llorà, Xavier
AU - Goldberg, David E.
PY - 2010
Y1 - 2010
N2 - Michigan-style learning classifier systems solve problems by evolving distributed subsolutions online. Extracting accurate models for subsolutions which are represented by a low number of examples in the training data set has been identified as a key challenge in LCS, and facetwise analysis has been applied to identify the critical elements for success in unbalanced domains. While models for these elements have been developed for XCS with ternary representation, no study has been performed for XCS with interval-based representation, which is most often used in data mining tasks. This paper therefore takes the original design decomposition and adapts it to the interval-based representation. Theory and experimental evidence indicate that XCS with interval-based representation may fail to approximate concepts scarcely represented in the training data set. To overcome this problem, an online covering operator that introduces new specific genetic material in regions where we suspect that there are rarities is designed. The benefits of the online covering operator are empirically analyzed on a collection of artificial and real-world problems.
AB - Michigan-style learning classifier systems solve problems by evolving distributed subsolutions online. Extracting accurate models for subsolutions which are represented by a low number of examples in the training data set has been identified as a key challenge in LCS, and facetwise analysis has been applied to identify the critical elements for success in unbalanced domains. While models for these elements have been developed for XCS with ternary representation, no study has been performed for XCS with interval-based representation, which is most often used in data mining tasks. This paper therefore takes the original design decomposition and adapts it to the interval-based representation. Theory and experimental evidence indicate that XCS with interval-based representation may fail to approximate concepts scarcely represented in the training data set. To overcome this problem, an online covering operator that introduces new specific genetic material in regions where we suspect that there are rarities is designed. The benefits of the online covering operator are empirically analyzed on a collection of artificial and real-world problems.
KW - Class imbalance problem
KW - Genetic-based machine learning
KW - Learning classifier systems
KW - Small disjuncts
UR - http://www.scopus.com/inward/record.url?scp=77955917176&partnerID=8YFLogxK
U2 - 10.1145/1830483.1830670
DO - 10.1145/1830483.1830670
M3 - Conference contribution
AN - SCOPUS:77955917176
SN - 9781450300728
T3 - Proceedings of the 12th Annual Genetic and Evolutionary Computation Conference, GECCO '10
SP - 1
EP - 8
BT - Proceedings of the 12th Annual Genetic and Evolutionary Computation Conference, GECCO '10
T2 - 12th Annual Genetic and Evolutionary Computation Conference, GECCO-2010
Y2 - 7 July 2010 through 11 July 2010
ER -