论文标题

数据分类不平衡绩效指数的适当性:分析

Appropriateness of Performance Indices for Imbalanced Data Classification: An Analysis

论文作者

Mullick, Sankha Subhra, Datta, Shounak, Dhekane, Sourish Gunesh, Das, Swagatam

论文摘要

指数量化了在类不平衡下的分类器的性能,通常会根据测试集的构成或特定班级的分类准确性而遭受扭曲,从而在评估分类器的优点方面造成了困难。我们确定了两种基本条件,绩效指数必须满足以分别弹性,以改变每个类别的测试实例数量以及测试集中的类数量。鉴于这些条件,在阶级失衡的效果下,我们理论上分析了四个用于评估二进制分类器的指数和多级分类器的五个流行指数。对于违反任何条件的指数,我们还建议补救修改和归一化。我们进一步研究了这些索引在所有类中保留分类性能的信息的能力,即使分类器在某些课程上表现出极端的性能。使用四个用于处理类失衡的最先进的分类器对Imagenet数据集子集的高维深度表示进行了仿真研究。最后,根据我们的理论发现和经验证据,我们建议使用适当的指标来评估在班级失控的情况下分类器的性能。

Indices quantifying the performance of classifiers under class-imbalance, often suffer from distortions depending on the constitution of the test set or the class-specific classification accuracy, creating difficulties in assessing the merit of the classifier. We identify two fundamental conditions that a performance index must satisfy to be respectively resilient to altering number of testing instances from each class and the number of classes in the test set. In light of these conditions, under the effect of class imbalance, we theoretically analyze four indices commonly used for evaluating binary classifiers and five popular indices for multi-class classifiers. For indices violating any of the conditions, we also suggest remedial modification and normalization. We further investigate the capability of the indices to retain information about the classification performance over all the classes, even when the classifier exhibits extreme performance on some classes. Simulation studies are performed on high dimensional deep representations of subset of the ImageNet dataset using four state-of-the-art classifiers tailored for handling class imbalance. Finally, based on our theoretical findings and empirical evidence, we recommend the appropriate indices that should be used to evaluate the performance of classifiers in presence of class-imbalance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源