论文标题
关于识别灾难Twitter数据中的主题标签
On Identifying Hashtags in Disaster Twitter Data
论文作者
论文摘要
推文主题标签有可能在灾难事件中改善搜索信息的搜索。但是,有大量与灾难有关的推文没有任何用户提供的主题标签。此外,只有少数包含可操作的主题标签的推文对于灾难响应有用。为了促进Twitter数据的灾难主题标签自动识别(或提取)的进度,我们构建了带有灾难相关的推文的唯一数据集,该推文带有标签,可用于过滤可行的信息。使用此数据集,我们进一步研究了多任务学习框架内的长期基于内存的模型。最佳性能模型的F1得分高达92.22%。数据集,代码和其他资源可在GitHub上找到。
Tweet hashtags have the potential to improve the search for information during disaster events. However, there is a large number of disaster-related tweets that do not have any user-provided hashtags. Moreover, only a small number of tweets that contain actionable hashtags are useful for disaster response. To facilitate progress on automatic identification (or extraction) of disaster hashtags for Twitter data, we construct a unique dataset of disaster-related tweets annotated with hashtags useful for filtering actionable information. Using this dataset, we further investigate Long Short Term Memory-based models within a Multi-Task Learning framework. The best performing model achieves an F1-score as high as 92.22%. The dataset, code, and other resources are available on Github.