基于本地标签不平衡的多标签采样

论文标题

基于本地标签不平衡的多标签采样

Multi-Label Sampling based on Local Label Imbalance

论文作者

Liu, Bin, Blekas, Konstantinos, Tsoumakas, Grigorios

论文摘要

类不平衡是多标签数据的固有特征，它阻碍了大多数多标签学习方法。解决此问题的一种有效且灵活的策略是在训练多标签学习模型之前采用抽样技术。尽管现有的多标签抽样方法可以减轻多标签数据集的全球失衡，但实际上，在少数族裔类示例的本地社区中，它在绩效退化中起着关键作用。为了解决这个问题，我们提出了一种新的措施，以评估多标签数据集的局部标签失衡，以及基于局部标签不平衡的两种多标签采样方法，即MLSOL和MLUL。通过考虑所有信息标签，MLSOL为困难的例子创造了更多样化，更具标签的合成实例，而MLUL则消除了对其当地有害的实例。 13个多标签数据集的实验结果证明了对各种评估指标提出的度量和采样方法的有效性，尤其是在对原始数据重复样本进行培训的分类器集合的情况下。

Class imbalance is an inherent characteristic of multi-label data that hinders most multi-label learning methods. One efficient and flexible strategy to deal with this problem is to employ sampling techniques before training a multi-label learning model. Although existing multi-label sampling approaches alleviate the global imbalance of multi-label datasets, it is actually the imbalance level within the local neighbourhood of minority class examples that plays a key role in performance degradation. To address this issue, we propose a novel measure to assess the local label imbalance of multi-label datasets, as well as two multi-label sampling approaches based on the local label imbalance, namely MLSOL and MLUL. By considering all informative labels, MLSOL creates more diverse and better labeled synthetic instances for difficult examples, while MLUL eliminates instances that are harmful to their local region. Experimental results on 13 multi-label datasets demonstrate the effectiveness of the proposed measure and sampling approaches for a variety of evaluation metrics, particularly in the case of an ensemble of classifiers trained on repeated samples of the original data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题