论文标题

迈向低资源语言的情感分析仪

Towards A Sentiment Analyzer for Low-Resource Languages

论文作者

Indriani, Dian, Nasution, Arbi Haza, Monika, Winda, Nasution, Salhazan

论文摘要

Twitter是受影响最大的社交媒体之一,该媒体拥有一百万个活跃用户。它通常用于微博,允许用户共享消息,想法,想法等。因此,在Twitter用户中讨论了在全球范围内发生的各种主题的互动,例如简短消息或推文正在流动。这项研究旨在分析用户对当时已经积极和大规模讨论的特定趋势主题的情感。我们选择了2019年印度尼西亚总统大选期间的趋势主题。我们选择了一个趋势主题。我们使用该标签从Twitter中获取一组数据来分析并进一步研究用户的正面或负面情绪。这项研究利用快速矿工工具来生成Twitter数据并比较天真的贝叶斯,K-Nearest邻居,决策树和多层感知器分类方法,以对Twitter数据的情感进行分类。该实验总共有200个标记数据。总体而言,幼稚的贝叶斯和多层感知器分类在11种实验上优于其他两种方法,其训练测试数据的大小不同。这两个分类器有可能用于创建具有小型语料库的低资源语言的情感分析仪。

Twitter is one of the top influenced social media which has a million number of active users. It is commonly used for microblogging that allows users to share messages, ideas, thoughts and many more. Thus, millions interaction such as short messages or tweets are flowing around among the twitter users discussing various topics that has been happening world-wide. This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time. We chose a hashtag \textit{\#kpujangancurang} that was the trending topic during the Indonesia presidential election in 2019. We use the hashtag to obtain a set of data from Twitter to analyse and investigate further the positive or the negative sentiment of the users from their tweets. This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data. There are overall 200 labeled data in this experiment. Overall, Naive Bayes and Multi-Layer Perceptron classification outperformed the other two methods on 11 experiments with different size of training-testing data split. The two classifiers are potential to be used in creating sentiment analyzer for low-resource languages with small corpus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源