论文标题

比较和整合来自多个来源的COVID-19与异常检测和修复的数据

Comparing and Integrating US COVID-19 Data from Multiple Sources with Anomaly Detection and Repairing

论文作者

Wang, Guannan, Gu, Zhiling, Li, Xinyi, Yu, Shan, Kim, Myungjin, Wang, Yueying, Gao, Lei, Wang, Li

论文摘要

在过去的几个月中,冠状病毒病(Covid-19)的爆发一直在扩大世界。该案件的可靠和准确数据集对于科学家进行相关研究和决策者做出更好的决定至关重要。我们收集了来自四个开源的美国Covid-19 Daily报告的数据:《纽约时报》,约翰·霍普金斯大学的《纽约时报》,《纽约时报》,《纽约时报》,《纽约时报》,《纽约时报》,《纽约时报》,《大西洋》和《 usafacts of usafacts》,然后比较其中的相似之处和差异。为了获得可靠的数据进行进一步分析,我们首先检查了周期性模式和以下异常,这些异常经常发生在报告的情况下:(1)依赖订单依赖性违规,(2)点或周期异常,以及(3)报告延迟的问题。为了解决这些检测到的问题,如果需要校正,我们建议使用相应的维修方法和程序。此外,我们将COVID-19报告的病例与官方资料中的当地特征的县级辅助信息(例如健康基础设施,人口统计,社会经济和环境信息)相结合,这对于了解病毒的传播也至关重要。

Over the past few months, the outbreak of Coronavirus disease (COVID-19) has been expanding over the world. A reliable and accurate dataset of the cases is vital for scientists to conduct related research and for policy-makers to make better decisions. We collect the United States COVID-19 daily reported data from four open sources: the New York Times, the COVID-19 Data Repository by Johns Hopkins University, the COVID Tracking Project at the Atlantic, and the USAFacts, then compare the similarities and differences among them. To obtain reliable data for further analysis, we first examine the cyclical pattern and the following anomalies, which frequently occur in the reported cases: (1) the order dependencies violation, (2) the point or period anomalies, and (3) the issue of reporting delay. To address these detected issues, we propose the corresponding repairing methods and procedures if corrections are necessary. In addition, we integrate the COVID-19 reported cases with the county-level auxiliary information of the local features from official sources, such as health infrastructure, demographic, socioeconomic, and environmental information, which are also essential for understanding the spread of the virus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源