传统机器学习模型与越南仇恨言语检测的神经网络模型之间的比较

论文标题

传统机器学习模型与越南仇恨言语检测的神经网络模型之间的比较

Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection

论文作者

Luu, Son T., Nguyen, Hung P., Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy

论文摘要

由于社交网络（例如Facebook和Twitter）的传播，社交网络语言上的仇恨语言检测已成为最近的主要研究领域之一。在越南，进攻和骚扰的威胁对在线用户造成了不良影响。 VLSP-关于社交网络上仇恨言论检测的共同任务展示了许多拟议的方法，用于检测任何评论是否干净。但是，这个问题仍然需要进一步研究。因此，我们将传统的机器学习和深度学习在一个大型数据集上进行了有关用户在越南社交网络上的评论，并通过比较它们对F1分数的准确性来找出什么是什么优势和劣势，然后我们选择了两个模型，其中两个模型在传统的机器学习模型中具有最高的准确性和深层的神经模型和深层神经模型。接下来，我们比较了能够通过引用其混乱矩阵并考虑每个模型的优点和缺点来比较能够预测正确标签的两个模型。最后，根据比较结果，我们提出了集合方法，该方法集中了传统方法和深度学习方法的能力。

Hate-speech detection on social network language has become one of the main researching fields recently due to the spreading of social networks like Facebook and Twitter. In Vietnam, the threat of offensive and harassment cause bad impacts for online user. The VLSP - Shared task about Hate Speech Detection on social networks showed many proposed approaches for detecting whatever comment is clean or not. However, this problem still needs further researching. Consequently, we compare traditional machine learning and deep learning on a large dataset about the user's comments on social network in Vietnamese and find out what is the advantage and disadvantage of each model by comparing their accuracy on F1-score, then we pick two models in which has highest accuracy in traditional machine learning models and deep neural models respectively. Next, we compare these two models capable of predicting the right label by referencing their confusion matrices and considering the advantages and disadvantages of each model. Finally, from the comparison result, we propose our ensemble method that concentrates the abilities of traditional methods and deep learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题