论文标题
厌恶女性的推文检测:与小数据集建模CNN
Misogynistic Tweet Detection: Modelling CNN with Small Datasets
论文作者
论文摘要
近年来,在社交媒体平台上针对女性的在线虐待引起了人们的关注。一种自动化的方法来有效识别厌恶女性的虐待,可以提高我们对持续时间段的滥用推文相关的模式,驱动因素和有效性的理解。但是,很难培训使用一小组标记数据来检测厌恶女性推文的培训神经网络(NN)模型。这部分是由于包含厌恶女性内容的推文的复杂性质,以及在NN模型中需要学习的大量参数。我们进行了一系列实验,以研究如何训练NN模型以有效地检测厌女症的推文。特别是,我们对卷积神经网络(CNN)体系结构进行了自定义和正规化,并表明在特定于任务的域上预先训练的矢量可以在可用时有效地训练CNN模型。以这种方式训练的CNN模型可在最先进的模型中提高准确性。
Online abuse directed towards women on the social media platform Twitter has attracted considerable attention in recent years. An automated method to effectively identify misogynistic abuse could improve our understanding of the patterns, driving factors, and effectiveness of responses associated with abusive tweets over a sustained time period. However, training a neural network (NN) model with a small set of labelled data to detect misogynistic tweets is difficult. This is partly due to the complex nature of tweets which contain misogynistic content, and the vast number of parameters needed to be learned in a NN model. We have conducted a series of experiments to investigate how to train a NN model to detect misogynistic tweets effectively. In particular, we have customised and regularised a Convolutional Neural Network (CNN) architecture and shown that the word vectors pre-trained on a task-specific domain can be used to train a CNN model effectively when a small set of labelled data is available. A CNN model trained in this way yields an improved accuracy over the state-of-the-art models.