TY - GEN
T1 - More Data Can Lead Us Astray
T2 - 10th AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2022
AU - Li, Yunyi
AU - De-Arteaga, Maria
AU - Saar-Tsechansky, Maytal
N1 - Publisher Copyright:
© 2022, Association for the Advancement of Artificial Intelligence.
PY - 2022/10/14
Y1 - 2022/10/14
N2 - An increased awareness concerning risks of algorithmic biash as driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.
AB - An increased awareness concerning risks of algorithmic biash as driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.
KW - Active Data Acquisition Using Crowdsourcing
KW - Algorithmic Fairness
KW - Label Bias
KW - Human Labeling Bias
UR - https://www.scopus.com/pages/publications/85175693180
U2 - 10.1609/hcomp.v10i1.21994
DO - 10.1609/hcomp.v10i1.21994
M3 - Conference contribution
AN - SCOPUS:85175693180
SN - 9781577358787
VL - 10
T3 - Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
SP - 133
EP - 146
BT - HCOMP 2022 - Proceedings of the 10th AAAI Conference on Human Computation and Crowdsourcing
A2 - Hsu, Jane
A2 - Yin, Ming
PB - Association for the Advancement of Artificial Intelligence (AAAI)
CY - Palo Alto, California
Y2 - 6 November 2022 through 10 November 2022
ER -