论文标题
GLM中的广义K-均值,适用于美国Covid-19的爆发
Generalized k-Means in GLMs with Applications to the Outbreak of COVID-19 in the United States
论文作者
论文摘要
可以将广义$ k $ -Means纳入任何相似性或差异措施以进行聚类。通过选择差异度度量作为众所周知的似然比或$ f $统计的措施,这项工作提出了一种基于概括的$ k $ -Meanss的方法,以对组统计模型。鉴于集群$ k $的数量,该方法是在统计模型之间的假设测试下建立的。如果$ k $未知,则该方法可以与GIC结合使用,以自动选择用于聚类的最佳$ K $。本文将AIC和BIC调查为特殊情况。理论和仿真结果表明,簇的数量可以通过BIC鉴定,而不是AIC。最终的GLM方法用于对美国Covid-19爆发的状态级时序列模式进行分组。一项进一步的研究表明,簇之间的统计模型彼此显着不同。这项研究证实了基于广义$ k $ -MEANS的拟议方法给出的结果。
Generalized $k$-means can be incorporated with any similarity or dissimilarity measure for clustering. By choosing the dissimilarity measure as the well known likelihood ratio or $F$-statistic, this work proposes a method based on generalized $k$-means to group statistical models. Given the number of clusters $k$, the method is established under hypothesis tests between statistical models. If $k$ is unknown, then the method can be combined with GIC to automatically select the best $k$ for clustering. The article investigates both AIC and BIC as the special cases. Theoretical and simulation results show that the number of clusters can be identified by BIC but not AIC. The resulting method for GLMs is used to group the state-level time series patterns for the outbreak of COVID-19 in the United States. A further study shows that the statistical models between the clusters are significantly different from each other. This study confirms the result given by the proposed method based on generalized $k$-means.