用潜在的分解和反事实来中和单词嵌入中和性别偏见

论文标题

用潜在的分解和反事实来中和单词嵌入中和性别偏见

Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation

论文作者

Shin, Seungjae, Song, Kyungwoo, Jang, JoonHo, Kim, Hyemi, Joo, Weonyoung, Moon, Il-Chul

论文摘要

最近的研究表明，在人类生成的语料库中接受训练的单词嵌入，在嵌入空间中具有强烈的性别偏见，这些偏见可能会导致各种下游任务的歧视性结果。虽然先前的方法项目词嵌入到线性子空间中以进行偏见，但我们使用暹罗自动编码器结构带有适合梯度反向层。我们的结构使语义潜在信息和给定单词的性别潜在信息的分离为分离潜在的维度。之后，我们介绍了\ textit {反事实生成}以转换单词的性别信息，因此原始和修改的嵌入可以在几何相位正规化后产生一个性别中性化的单词嵌入，而不会丢失语义信息。从各种定量和定性的偏见实验中，我们的方法表明，在单词嵌入中的现有证明方法要好。此外，我们的方法通过最大程度地减少了下游任务的外在NLP的语义信息损失来保留在借鉴过程中保留语义信息的能力。

Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a \textit{Latent Disentanglement} method with a siamese auto-encoder structure with an adapted gradient reversal layer. Our structure enables the separation of the semantic latent information and gender latent information of given word into the disjoint latent dimensions. Afterwards, we introduce a \textit{Counterfactual Generation} to convert the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment regularization, without loss of semantic information. From the various quantitative and qualitative debiasing experiments, our method shows to be better than existing debiasing methods in debiasing word embeddings. In addition, Our method shows the ability to preserve semantic information during debiasing by minimizing the semantic information losses for extrinsic NLP downstream tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题