TY - JOUR
T1 - Finding Pareto trade-offs in fair and accurate detection of toxic speech
AU - Gupta, Soumyajit
AU - Kovatchev, Venelin
AU - Das, Anubrata
AU - De-Arteaga, Maria
AU - Lease, Matthew
N1 - Publisher Copyright:
© 2025, University of Boras. All rights reserved.
PY - 2025/3
Y1 - 2025/3
N2 - Introduction. Optimizing NLP models for fairness poses many challenges. Lack of differentiable fairness measures prevents gradient-based loss training or requires surrogate losses that diverge from the true metric of interest. In addition, competing objectives (e.g., accuracy vs. fairness) often require making trade-offs based on stakeholder preferences, but stakeholders may not know their preferences before seeing system performance under different trade-off settings. Method. We formulate the GAP loss, a differentiable version of a fairness measure, Accuracy Parity, to provide balanced accuracy across binary demographic groups. Analysis. We show how model-agnostic, HyperNetwork optimization can efficiently train arbitrary NLP model architectures to learn Pareto-optimal trade-offs between competing metrics like predictive performance vs. group fairness. Results. Focusing on the task of toxic language detection, we show the generality and efficacy of our proposed GAP loss function across two datasets, three neural architectures, and three fairness loss functions. Conclusion. Our GAP loss for the task of TL detection demonstrates promising results-improved fairness and computational efficiency. Our work can be extended to other tasks, datasets, and neural models in any practical situation where ensuring equal accuracy across different demographic groups is a desired objective.
AB - Introduction. Optimizing NLP models for fairness poses many challenges. Lack of differentiable fairness measures prevents gradient-based loss training or requires surrogate losses that diverge from the true metric of interest. In addition, competing objectives (e.g., accuracy vs. fairness) often require making trade-offs based on stakeholder preferences, but stakeholders may not know their preferences before seeing system performance under different trade-off settings. Method. We formulate the GAP loss, a differentiable version of a fairness measure, Accuracy Parity, to provide balanced accuracy across binary demographic groups. Analysis. We show how model-agnostic, HyperNetwork optimization can efficiently train arbitrary NLP model architectures to learn Pareto-optimal trade-offs between competing metrics like predictive performance vs. group fairness. Results. Focusing on the task of toxic language detection, we show the generality and efficacy of our proposed GAP loss function across two datasets, three neural architectures, and three fairness loss functions. Conclusion. Our GAP loss for the task of TL detection demonstrates promising results-improved fairness and computational efficiency. Our work can be extended to other tasks, datasets, and neural models in any practical situation where ensuring equal accuracy across different demographic groups is a desired objective.
KW - Accuracy Difference
KW - Fairness
KW - Group Balanced Accuracy
KW - Pareto Trade-off
UR - https://www.scopus.com/pages/publications/105000144169
U2 - 10.47989/ir30iConf47572
DO - 10.47989/ir30iConf47572
M3 - Article
AN - SCOPUS:105000144169
SN - 1368-1613
VL - 30
SP - 123
EP - 141
JO - Information Research
JF - Information Research
IS - iConf 2025
ER -