论文标题
真正的椭圆形偏斜分布及其在稳健群集分析中的应用
Real Elliptically Skewed Distributions and Their Application to Robust Cluster Analysis
论文作者
论文摘要
本文提出了一类新的椭圆形偏斜(RESK)分布和关联的聚类算法,该算法允许将鲁棒性和偏斜整合到单个统一的集群分析框架中。在各种现实世界中,已经报道了非对称分布和重尾数据簇。鲁棒性是必不可少的,因为一些外围的观察结果可以严重掩盖群集结构。 RESK分布是对实际椭圆形(RES)分布的概括。为了估算群集参数和成员资格,我们得出了任意RESK分布的期望最大化(EM)算法。特别注意新的稳健的偏度M估计器,这也是属于RESK类的偏斜分布的最大似然估计器(MLE)。对模拟和现实世界数据的数值实验证实了所提出的方法对偏斜和重尾数据集的有用性。
This article proposes a new class of Real Elliptically Skewed (RESK) distributions and associated clustering algorithms that allow for integrating robustness and skewness into a single unified cluster analysis framework. Non-symmetrically distributed and heavy-tailed data clusters have been reported in a variety of real-world applications. Robustness is essential because a few outlying observations can severely obscure the cluster structure. The RESK distributions are a generalization of the Real Elliptically Symmetric (RES) distributions. To estimate the cluster parameters and memberships, we derive an expectation maximization (EM) algorithm for arbitrary RESK distributions. Special attention is given to a new robust skew-Huber M-estimator, which is also the maximum likelihood estimator (MLE) for the skew-Huber distribution that belongs to the RESK class. Numerical experiments on simulated and real-world data confirm the usefulness of the proposed methods for skewed and heavy-tailed data sets.