论文标题
通过提示,在预训练的语言模型中识别和测量令牌级别的情感偏见
Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts
论文作者
论文摘要
由于表现出色,大规模的预训练的语言模型(PLM)已在人类社会的许多方面被广泛采用。但是,我们仍然缺乏有效的工具来了解黑盒模型中嵌入的潜在偏差。迅速调整的最新进展表明了探索PLM的内部机制的可能性。在这项工作中,我们提出了两个令牌情感测试:情感关联测试(SAT)和情感转移测试(SST),这些测试(SST)利用提示作为探针来检测PLMS中的潜在偏置。我们对情感数据集收集的实验表明,SAT和SST都可以识别PLMS中的情感偏差,而SST可以量化偏差。结果还表明,微调可能会增加PLM中的现有偏见。
Due to the superior performance, large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society. However, we still lack effective tools to understand the potential bias embedded in the black-box models. Recent advances in prompt tuning show the possibility to explore the internal mechanism of the PLMs. In this work, we propose two token-level sentiment tests: Sentiment Association Test (SAT) and Sentiment Shift Test (SST) which utilize the prompt as a probe to detect the latent bias in the PLMs. Our experiments on the collection of sentiment datasets show that both SAT and SST can identify sentiment bias in PLMs and SST is able to quantify the bias. The results also suggest that fine-tuning can possibly augment the existing bias in PLMs.