在英文代码混合和单语文本中的统一侵略性识别系统

论文标题

在英文代码混合和单语文本中的统一侵略性识别系统

A Unified System for Aggression Identification in English Code-Mixed and Uni-Lingual Texts

论文作者

Khandelwal, Anant, Kumar, Niraj

论文摘要

社交媒体平台的广泛使用增加了侵略的风险，这会导致精神压力，并影响人们的生活，例如心理痛苦，抗击行为和对他人的不尊重。大多数此类对话都包含代码混合语言[28]。此外，用来表达思想或沟通方式的方式也从一个社交媒体平台变为另一个平台（例如，Twitter和Facebook中的沟通方式不同）。这些都增加了问题的复杂性。为了解决这些问题，我们引入了一个统一且强大的多模式深度学习体系结构，该体系结构适用于英语代码混合的数据集和单语英文数据集。设计的系统使用了心理语言特征和非常BA-SIC的语言特征。我们的多模式深度学习体系结构包含深层金字塔CNN，合并的Bilstm和断开的RNN（都带有手套和FastText嵌入）。最后，系统根据模型平均做出决定。我们在英语代码混合TRAC 2018数据集和从Kaggle获得的单语英文数据集进行了评估。实验结果表明，我们提出的系统优于英文代码混合数据集和单语英文数据集的所有先前方法。

Wide usage of social media platforms has increased the risk of aggression, which results in mental stress and affects the lives of people negatively like psychological agony, fighting behavior, and disrespect to others. Majority of such conversations contains code-mixed languages[28]. Additionally, the way used to express thought or communication style also changes from one social media plat-form to another platform (e.g., communication styles are different in twitter and Facebook). These all have increased the complexity of the problem. To solve these problems, we have introduced a unified and robust multi-modal deep learning architecture which works for English code-mixed dataset and uni-lingual English dataset both.The devised system, uses psycho-linguistic features and very ba-sic linguistic features. Our multi-modal deep learning architecture contains, Deep Pyramid CNN, Pooled BiLSTM, and Disconnected RNN(with Glove and FastText embedding, both). Finally, the system takes the decision based on model averaging. We evaluated our system on English Code-Mixed TRAC 2018 dataset and uni-lingual English dataset obtained from Kaggle. Experimental results show that our proposed system outperforms all the previous approaches on English code-mixed dataset and uni-lingual English dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题