论文标题

GeoCov19:数亿多语言Covid-19带有位置信息的数据集

GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

论文作者

Qazi, Umair, Imran, Muhammad, Ofli, Ferda

论文摘要

在过去的几年中,在大规模融合事件(例如健康紧急情况,自然或人类引起的灾难)中,社交媒体平台的使用激增。在准备流行病和大流行病时,这些非传统数据来源对于疾病预测和监视至关重要。在本文中,我们提出了Geocov19,这是一种大规模的Twitter数据集,其中包含超过5.24亿个多语言推文在2020年2月1日以来发布的90天。我们假设这个大规模,多语言,地理社交媒体数据可以使研究社区能够评估社会如何共同应对这一前所未有的全球危机,以及开发计算方法,以应对诸如识别伪造新闻,了解社区知识差距,建立疾病的预测和培育模型等挑战,等等。

The past several years have witnessed a huge surge in the use of social media platforms during mass convergence events such as health emergencies, natural or human-induced disasters. These non-traditional data sources are becoming vital for disease forecasts and surveillance when preparing for epidemic and pandemic outbreaks. In this paper, we present GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020. Moreover, we employ a gazetteer-based approach to infer the geolocation of tweets. We postulate that this large-scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis as well as to develop computational methods to address challenges such as identifying fake news, understanding communities' knowledge gaps, building disease forecast and surveillance models, among others.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源