论文标题

泊松数据中的热点检测数据辅助平滑稀疏张量分解

Hot-spots Detection in Count Data by Poisson Assisted Smooth Sparse Tensor Decomposition

论文作者

Zhao, Yujie, Huo, Xiaoming, Mei, Yajun

论文摘要

计数数据广泛发生在许多生物保育和医疗保健应用中,例如,来自不同城市/县/州的不同类型的传染病的新患者的数量随着时间的流逝,例如每天/每周/每月。对于这种类型的计数数据,一项重要的任务是根据异常传染病的快速检测和定位热点,以便我们可以做出适当的反应。在本文中,我们开发了一种称为Poisson辅助平滑稀疏张量分解(POSSTEND)的方法,它不仅检测到何时出现热点,而且还可以定位在发生热点。我们提出的POSSTEND方法的主要思想被阐明如下。首先,我们将观察到的计数数据表示为三维张量,包括(1)位置模式的空间维度,例如不同的城市/国家/国家/州; (2)时间模式的时间域,例如每日/每周/每月; (3)不同类型的数据源的分类维度,例如不同类型的疾病。其次,我们将此张量放入泊松回归模型中,然后将传染病率分为两个组成部分:平滑的全球趋势和局部热点。第三,我们通过构建累积总和(CUSUM)控制图来检测热点何时发生,并通过其低音型稀疏估计来定位热点。我们提出的方法的有用性通过数值模拟研究和现实世界中的数据集进行了验证,该数据集记录了1993年至2018年的10种不同传染病的年数,用于美国49个大陆州。

Count data occur widely in many bio-surveillance and healthcare applications, e.g., the numbers of new patients of different types of infectious diseases from different cities/counties/states repeatedly over time, say, daily/weekly/monthly. For this type of count data, one important task is the quick detection and localization of hot-spots in terms of unusual infectious rates so that we can respond appropriately. In this paper, we develop a method called Poisson assisted Smooth Sparse Tensor Decomposition (PoSSTenD), which not only detects when hot-spots occur but also localizes where hot-spots occur. The main idea of our proposed PoSSTenD method is articulated as follows. First, we represent the observed count data as a three-dimensional tensor including (1) a spatial dimension for location patterns, e.g., different cities/countries/states; (2) a temporal domain for time patterns, e.g., daily/weekly/monthly; (3) a categorical dimension for different types of data sources, e.g., different types of diseases. Second, we fit this tensor into a Poisson regression model, and then we further decompose the infectious rate into two components: smooth global trend and local hot-spots. Third, we detect when hot-spots occur by building a cumulative sum (CUSUM) control chart and localize where hot-spots occur by their LASSO-type sparse estimation. The usefulness of our proposed methodology is validated through numerical simulation studies and a real-world dataset, which records the annual number of 10 different infectious diseases from 1993 to 2018 for 49 mainland states in the United States.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源