论文标题
基于M估计的强大基于贝叶斯集群枚举用于实际椭圆形分布
Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions
论文作者
论文摘要
在数据集中确定最佳簇数量是广泛应用程序中的重要因素。当观察到的数据中的真正基础结构被重尾噪声和离群值损坏时,集群枚举变得具有挑战性。最近,通过将群集枚举作为候选模型的后验概率最大化来得出贝叶斯群集枚举标准。本文概括了鲁棒的贝叶斯集群枚举,因此可以与任何任意的椭圆形(RES)分布式混合物模型一起使用。我们的框架还涵盖了允许混合模型的M估计器的情况,这些模型与特定的概率分布相关。讨论了Huber和Tukey的M估计量的例子。我们为具有有限样本的数据集提供了一个可靠的标准,还提供了渐近近似,以降低大型样本量的计算成本。将算法应用于模拟和现实世界中的数据集,包括基于雷达的人的识别,与现有方法相比显示出显着的鲁棒性改进。
Robustly determining the optimal number of clusters in a data set is an essential factor in a wide range of applications. Cluster enumeration becomes challenging when the true underlying structure in the observed data is corrupted by heavy-tailed noise and outliers. Recently, Bayesian cluster enumeration criteria have been derived by formulating cluster enumeration as maximization of the posterior probability of candidate models. This article generalizes robust Bayesian cluster enumeration so that it can be used with any arbitrary Real Elliptically Symmetric (RES) distributed mixture model. Our framework also covers the case of M-estimators that allow for mixture models, which are decoupled from a specific probability distribution. Examples of Huber's and Tukey's M-estimators are discussed. We derive a robust criterion for data sets with finite sample size, and also provide an asymptotic approximation to reduce the computational cost at large sample sizes. The algorithms are applied to simulated and real-world data sets, including radar-based person identification, and show a significant robustness improvement in comparison to existing methods.