论文标题

控制偏见暴露以进行公平解释的预测

Controlling Bias Exposure for Fair Interpretable Predictions

论文作者

He, Zexue, Wang, Yu, McAuley, Julian, Majumder, Bodhisattwa Prasad

论文摘要

减少NLP模型中偏见的最新工作通常着重于保护或隔离与敏感属性有关的信息(例如性别或种族)。但是,当敏感信息与输入的任务信息(例如,性别信息都可以预测专业)时,很难实现任务绩效和缓解偏见之间的公平权衡。现有方法通过消除潜在空间中的偏见信息来实现这种权衡,而缺乏对需要删除多少偏见的控制。我们认为,一种有利的偏见方法应“公平”使用敏感信息,而不是盲目消除它(Caliskan等,2017; Sun等,2019; Bogen等,2020)。在这项工作中,我们通过将预测模型的信念调整为(1)敏感信息来提供一种新颖的歧义算法,如果敏感信息对任务没有用; (2)根据预测的必要条件最少地使用敏感信息(同时还会产生罚款)。对两项文本分类任务(受性别影响)和开放式生成任务(受种族影响)的实验结果表明,我们的模型在借记和任务绩效之间实现了理想的权衡,并产生了偏见的理由作为证据。

Recent work on reducing bias in NLP models usually focuses on protecting or isolating information related to a sensitive attribute (like gender or race). However, when sensitive information is semantically entangled with the task information of the input, e.g., gender information is predictive for a profession, a fair trade-off between task performance and bias mitigation is difficult to achieve. Existing approaches perform this trade-off by eliminating bias information from the latent space, lacking control over how much bias is necessarily required to be removed. We argue that a favorable debiasing method should use sensitive information 'fairly', rather than blindly eliminating it (Caliskan et al., 2017; Sun et al., 2019; Bogen et al., 2020). In this work, we provide a novel debiasing algorithm by adjusting the predictive model's belief to (1) ignore the sensitive information if it is not useful for the task; (2) use sensitive information minimally as necessary for the prediction (while also incurring a penalty). Experimental results on two text classification tasks (influenced by gender) and an open-ended generation task (influenced by race) indicate that our model achieves a desirable trade-off between debiasing and task performance along with producing debiased rationales as evidence.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源