论文标题
模型 - 不足的特征的重要性和效果具有相关特征 - 条件亚组方法
Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach
论文作者
论文摘要
当特征取决于机器学习模型中,对特征重要性的解释是具有挑战性的。置换特征重要性(PFI)忽略了此类依赖性,这可能会导致由于推断而引起的误导性解释。可能的补救措施是更先进的条件PFI方法,可以评估特征在所有其他功能上有条件的重要性。由于观点的这种转变并为了实现正确的解释,因此,重要的是条件是透明且可理解的。在本文中,我们提出了一种基于条件亚组中排列的条件分布的新抽样机制。由于这些亚组是使用决策树(转换树)构建的,因此条件固有地解释了。这不仅提供了有条件的PFI的简单有效估计器,还提供了子组中的局部PFI估计值。此外,我们将条件亚组方法应用于部分依赖图(PDP),这是一种描述特征效应的流行方法,当特征依赖性并在模型中存在相互作用时,也可能会遭受推断的影响。我们表明,基于条件亚组的PFI和PDP通常超过基于仿冒品或累积局部效应图的条件PFI等方法。此外,我们的方法可以对条件亚组中的特征效应和重要性进行更细粒度的解释。
The interpretation of feature importance in machine learning models is challenging when features are dependent. Permutation feature importance (PFI) ignores such dependencies, which can cause misleading interpretations due to extrapolation. A possible remedy is more advanced conditional PFI approaches that enable the assessment of feature importance conditional on all other features. Due to this shift in perspective and in order to enable correct interpretations, it is therefore important that the conditioning is transparent and humanly comprehensible. In this paper, we propose a new sampling mechanism for the conditional distribution based on permutations in conditional subgroups. As these subgroups are constructed using decision trees (transformation trees), the conditioning becomes inherently interpretable. This not only provides a simple and effective estimator of conditional PFI, but also local PFI estimates within the subgroups. In addition, we apply the conditional subgroups approach to partial dependence plots (PDP), a popular method for describing feature effects that can also suffer from extrapolation when features are dependent and interactions are present in the model. We show that PFI and PDP based on conditional subgroups often outperform methods such as conditional PFI based on knockoffs, or accumulated local effect plots. Furthermore, our approach allows for a more fine-grained interpretation of feature effects and importance within the conditional subgroups.