通过时间信息高斯混合模型无监督的网络欺凌检测

论文标题

通过时间信息高斯混合模型无监督的网络欺凌检测

Unsupervised Cyberbullying Detection via Time-Informed Gaussian Mixture Model

论文作者

Cheng, Lu, Shu, Kai, Wu, Siqi, Silva, Yasin N., Hall, Deborah L., Liu, Huan

论文摘要

社交媒体是信息共享的重要手段，因为它易于访问，低成本和快速传播特征。但是，社交媒体使用的增加与网络欺凌的流行率的升高相对应。大多数现有的网络欺凌检测方法都受到监督，因此有两个关键的缺点：（1）数据标记过程通常是时必时间的和劳动力密集的；（2）由于语言使用不同和不断发展的社交网络，当前的标签指南可能不会推广到未来的实例。为了解决这些局限性，这项工作引入了一种无监督的网络欺凌检测方法的原则方法。所提出的模型由两个主要组成部分组成：（1）通过利用多模式功能（例如文本，网络和时间）来编码社交媒体会话的表示网络。（2）一个多任务学习网络，同时符合评论间隔时间，并根据高斯混合模型估算欺凌可能性。提出的模型共同优化了两个组件的参数，以克服脱钩训练的缺点。我们的核心贡献是一种无监督的网络欺凌检测模型，它不仅在实验上优于最先进的无监督模型，而且与监督模型相比，还可以实现竞争性能。

Social media is a vital means for information-sharing due to its easy access, low cost, and fast dissemination characteristics. However, increases in social media usage have corresponded with a rise in the prevalence of cyberbullying. Most existing cyberbullying detection methods are supervised and, thus, have two key drawbacks: (1) The data labeling process is often time-consuming and labor-intensive; (2) Current labeling guidelines may not be generalized to future instances because of different language usage and evolving social networks. To address these limitations, this work introduces a principled approach for unsupervised cyberbullying detection. The proposed model consists of two main components: (1) A representation learning network that encodes the social media session by exploiting multi-modal features, e.g., text, network, and time. (2) A multi-task learning network that simultaneously fits the comment inter-arrival times and estimates the bullying likelihood based on a Gaussian Mixture Model. The proposed model jointly optimizes the parameters of both components to overcome the shortcomings of decoupled training. Our core contribution is an unsupervised cyberbullying detection model that not only experimentally outperforms the state-of-the-art unsupervised models, but also achieves competitive performance compared to supervised models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题