论文标题

解释数据或解释模型?莎普利值发现非线性依赖性

Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies

论文作者

Fryer, Daniel Vidali, Strümke, Inga, Nguyen, Hien

论文摘要

莎普利的价值观在机器学习文献中越来越流行,这要归功于其吸引人的公理化,灵活性和独特性,以满足某些“公平”的概念。灵活性来自Shapley Value \ TextIt {Game Fienculation}的无数潜在形式。在这种灵活性的后果之一是,现在讨论了许多类型的沙普利价值观,而这种品种是潜在误解的来源。据我们所知,机器学习和统计文献中的所有现有游戏公式都属于一个类别,我们将游戏公式的模型依赖性类别命名为类别。在这项工作中,我们考虑了一种替代性和新颖的表述,该表述导致了我们所谓的独立于模型的Shapley值的第一个实例。这些沙普利值使用(非参数)非线性依赖性作为特征函数。这些莎普利值的强度在于它们能够在特征之间揭示和归因于非线性依赖性。我们介绍并证明了能量距离相关性,仿射不变距离相关性以及希尔伯特·希米特独立标准作为沙普利价值特征函数的使用。特别是,我们证明了它们在探索性数据分析和模型诊断方面的潜在价值。我们以对经典医疗调查数据集的有趣的说明性应用结束。

Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value \textit{game formulation}. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a (non-parametric) measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a classical medical survey data set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源