论文标题
人机协作方法以建立仇恨言论对话的对话数据集
Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering
论文作者
论文摘要
与在线仇恨言论进行战斗是一个挑战,通常通过自动检测和删除仇恨内容来解决自然语言处理。除了这种方法外,反叙事还成为非政府组织在社交媒体平台上对在线仇恨做出反应的有效工具。因此,目前正在研究自然语言,以自动化反叙事写作的一种方式。但是,训练NLG模型所需的现有资源仅限于2局互动(仇恨言论和反应反应),而在现实生活中,互动可以由多个转弯组成。在本文中,我们提出了一种用于对话数据收集的混合方法,该方法结合了人类专家注释者对使用19种不同配置获得的对话的干预措施。这项工作的结果是Dialoconan,这是第一个数据集,其中包括仇恨者和非政府组织操作员之间的3000多个虚拟的多转向对话,涵盖了6个仇恨目标。
Fighting online hate speech is a challenge that is usually addressed using Natural Language Processing via automatic detection and removal of hate content. Besides this approach, counter narratives have emerged as an effective tool employed by NGOs to respond to online hate on social media platforms. For this reason, Natural Language Generation is currently being studied as a way to automatize counter narrative writing. However, the existing resources necessary to train NLG models are limited to 2-turn interactions (a hate speech and a counter narrative as response), while in real life, interactions can consist of multiple turns. In this paper, we present a hybrid approach for dialogical data collection, which combines the intervention of human expert annotators over machine generated dialogues obtained using 19 different configurations. The result of this work is DIALOCONAN, the first dataset comprising over 3000 fictitious multi-turn dialogues between a hater and an NGO operator, covering 6 targets of hate.