由决策树的几何平均度量学习启发的新型分裂标准

论文标题

由决策树的几何平均度量学习启发的新型分裂标准

A Novel Splitting Criterion Inspired by Geometric Mean Metric Learning for Decision Tree

论文作者

Li, Dan, Chen, Songcan

论文摘要

决策树（DT）由于其令人印象深刻的经验表现和在众多应用中的可解释性而引起了持续的研究关注。但是，传统但广泛使用的单变量决策树（UDTS）的增长非常耗时，因为它们需要遍历所有特征，以找到分裂值，而在每个内部节点处的杂质最大降低。在本文中，我们新设计一个分裂标准，以加快增长。该标准是由几何平均度量学习（GMML）诱导的，然后在其对角度计量矩阵约束下进行了优化，因此，可以立即获得特征判别能力的封闭形式等级，并且用于在每个节点上用于生长dt的每个节点的前1个特征（称为DGMML-DT，在其中称为DGMML-DT，均为缩写为缩写，以进行缩写化以进行缩写。我们评估了提出的方法的性能及其在基准数据集上的相应集合。该实验表明，与10倍平均速度的UDT相比，DGMML-DT获得可比或更好的分类结果。此外，DGMML-DT可以直接扩展到其多变量对应物（DGMML-MDT），而无需费力的操作。

Decision tree (DT) attracts persistent research attention due to its impressive empirical performance and interpretability in numerous applications. However, the growth of traditional yet widely-used univariate decision trees (UDTs) is quite time-consuming as they need to traverse all the features to find the splitting value with the maximal reduction of the impurity at each internal node. In this paper, we newly design a splitting criterion to speed up the growth. The criterion is induced from Geometric Mean Metric Learning (GMML) and then optimized under its diagonalized metric matrix constraint, consequently, a closed-form rank of feature discriminant abilities can at once be obtained and the top 1 feature at each node used to grow an intent DT (called as dGMML-DT, where d is an abbreviation for diagonalization). We evaluated the performance of the proposed methods and their corresponding ensembles on benchmark datasets. The experiment shows that dGMML-DT achieves comparable or better classification results more efficiently than the UDTs with 10x average speedup. Furthermore, dGMML-DT can straightforwardly be extended to its multivariable counterpart (dGMML-MDT) without needing laborious operations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题