论文标题
黎巴嫩人:对黎巴嫩推文的彻底研究
LEBANONUPRISING: a thorough study of Lebanese tweets
论文作者
论文摘要
最近的研究表明,人们对社交网络情绪分析具有巨大的兴趣。 Twitter是一项微博服务,可以成为用户对某个主题的感觉,或者他们对社会,经济甚至政治问题的看法的重要信息来源。 10月17日,黎巴嫩见证了一场革命的开始。在Twitter上,黎巴嫩人的主题标签变病了。 10月18日至21日之间收集了一个由1000000个推文组成的数据集。在本文中,我们使用不同的机器学习算法进行了与黎巴嫩自助式标签有关的黎巴嫩阿拉伯语的推文的情感分析研究。手动注释数据集以测量精度和回忆指标并在不同算法之间进行比较。此外,本文完成的工作提供了另外两项贡献。第一个与建造黎巴嫩人到现代标准的阿拉伯映射词典有关,该词典用于预处理推文,第二个是尝试使用表情符号的情感分析转变为情感检测,我们试图预测的两种情绪是“讽刺”和“有趣”的情绪。我们从2019年10月收集的推文中建立了培训集,然后我们使用此集合来预测我们在2020年5月至2020年8月之间收集的推文的情感和情感。我们进行的分析显示了两个数据集之间的情感,情感和用户的差异。据我们所知,我们获得的结果似乎令人满意,特别是考虑到没有以前或类似的工作涉及黎巴嫩阿拉伯推文。
Recent studies showed a huge interest in social networks sentiment analysis. Twitter, which is a microblogging service, can be a great source of information on how the users feel about a certain topic, or what their opinion is regarding a social, economic and even political matter. On October 17, Lebanon witnessed the start of a revolution; the LebanonUprising hashtag became viral on Twitter. A dataset consisting of a 100,0000 tweets was collected between 18 and 21 October. In this paper, we conducted a sentiment analysis study for the tweets in spoken Lebanese Arabic related to the LebanonUprising hashtag using different machine learning algorithms. The dataset was manually annotated to measure the precision and recall metrics and to compare between the different algorithms. Furthermore, the work completed in this paper provides two more contributions. The first is related to building a Lebanese to Modern Standard Arabic mapping dictionary that was used for the preprocessing of the tweets and the second is an attempt to move from sentiment analysis to emotion detection using emojis, and the two emotions we tried to predict were the "sarcastic" and "funny" emotions. We built a training set from the tweets collected in October 2019 and then we used this set to predict sentiments and emotions of the tweets we collected between May and August 2020. The analysis we conducted shows the variation in sentiments, emotions and users between the two datasets. The results we obtained seem satisfactory especially considering that there was no previous or similar work done involving Lebanese Arabic tweets, to our knowledge.