论文标题

COMADOUT-基于Comad的强大离群检测算法

CoMadOut -- A Robust Outlier Detection Algorithm based on CoMAD

论文作者

Lohrer, Andreas, Kazempour, Daniyal, Hünemörder, Maximilian, Kröger, Peer

论文摘要

无监督的学习方法在异常检测领域得到了很好的确定,并在离群数据集上实现了最先进的表现。异常值起着重要作用,因为它们具有扭曲给定数据集上机器学习算法的预测的潜力。尤其是在基于PCA的方法中,离群值在结果方面具有额外的破坏性潜力:它们不仅扭曲了主要成分的方向和翻译,而且还使检测异常值更为复杂。为了解决这个问题,我们提出了可靠的离群检测算法comAdout,该算法满足了两个必需的属性:(1)对异常值稳健,(2)检测它们。我们使用喜剧演员PCA定义的ComAdout Outier检测变体,取决于其变体,这是一个通过分布量的量度(变体CMO)和通过分布外(变体CMO*)测量的近距离噪声边缘的内部区域,例如。 CMO+K的峰度加权。这些措施允许为每个主要成分分配分布分布的分布分数评分,因此,正常情况和异常实例之间的异常值程度的适当比对。将连击与传统,深层和其他可比的鲁棒异常检测方法进行比较的实验表明,引入的连续方法的性能与与平均精度(AP),精确召回曲线(AUPRC)下的面积(AUPRC)和接收机操作特征(AUROC)曲线下的面积相关的良好方法具有竞争力。总而言之,我们的方法可以看作是用于离群检测任务的强大替代方法。

Unsupervised learning methods are well established in the area of anomaly detection and achieve state of the art performances on outlier datasets. Outliers play a significant role, since they bear the potential to distort the predictions of a machine learning algorithm on a given dataset. Especially among PCA-based methods, outliers have an additional destructive potential regarding the result: they may not only distort the orientation and translation of the principal components, they also make it more complicated to detect outliers. To address this problem, we propose the robust outlier detection algorithm CoMadOut, which satisfies two required properties: (1) being robust towards outliers and (2) detecting them. Our CoMadOut outlier detection variants using comedian PCA define, dependent on its variant, an inlier region with a robust noise margin by measures of in-distribution (variant CMO) and optimized scores by measures of out-of-distribution (variants CMO*), e.g. kurtosis-weighting by CMO+k. These measures allow distribution based outlier scoring for each principal component, and thus, an appropriate alignment of the degree of outlierness between normal and abnormal instances. Experiments comparing CoMadOut with traditional, deep and other comparable robust outlier detection methods showed that the performance of the introduced CoMadOut approach is competitive to well established methods related to average precision (AP), area under the precision recall curve (AUPRC) and area under the receiver operating characteristic (AUROC) curve. In summary our approach can be seen as a robust alternative for outlier detection tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源