多模式社交媒体事件过滤的几个学习学习

论文标题

多模式社交媒体事件过滤的几个学习学习

Few-shot Learning for Multi-modal Social Media Event Filtering

论文作者

Nascimento, José, Cardenuto, João Phillipe, Yang, Jing, Rocha, Anderson

论文摘要

社交媒体已成为事件分析的重要数据源。收集此类数据时，大多数人对目标事件没有任何有用的信息。因此，必须在人类专家进行进一步检查的最早机会中滤除这些嘈杂的数据。事件过滤的大多数现有解决方案都取决于完全监督的培训方法。但是，在许多实际情况下，无法访问大量标记样品。为了处理一些用于事件过滤的标签样品训练问题，我们建议基于图形的几片学习管道。我们还发布了巴西抗议数据集以测试我们的方法。据我们所知，该数据集是事件过滤中的第一个此类数据集，该数据集侧重于多模式社交媒体数据中的抗议活动，其中大部分文本都在葡萄牙语中。我们的实验结果表明，与完全标记的数据集（3100）相比，我们提出的管道仅具有可比性的性能（60）（60）。为了促进研究社区，我们可以在https://github.com/jdnascim/7set-al上提供数据集和代码。

Social media has become an important data source for event analysis. When collecting this type of data, most contain no useful information to a target event. Thus, it is essential to filter out those noisy data at the earliest opportunity for a human expert to perform further inspection. Most existing solutions for event filtering rely on fully supervised methods for training. However, in many real-world scenarios, having access to large number of labeled samples is not possible. To deal with a few labeled sample training problem for event filtering, we propose a graph-based few-shot learning pipeline. We also release the Brazilian Protest Dataset to test our method. To the best of our knowledge, this dataset is the first of its kind in event filtering that focuses on protests in multi-modal social media data, with most of the text in Portuguese. Our experimental results show that our proposed pipeline has comparable performance with only a few labeled samples (60) compared with a fully labeled dataset (3100). To facilitate the research community, we make our dataset and code available at https://github.com/jdnascim/7Set-AL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题