论文标题
全球和本地解释Tsetlin机器的封闭式表达式,并用应用程序解释高维数据
Closed-Form Expressions for Global and Local Interpretation of Tsetlin Machines with Applications to Explaining High-Dimensional Data
论文作者
论文摘要
TSETLIN机器(TMS)使用命题逻辑中的结合子句捕获模式,从而促进解释。但是,最近的基于TM的方法主要依赖于单独检查各个子句。这种检查不一定会扩展到需要大量子句的复杂预测问题。在本文中,我们提出了封闭形式的表达式,以理解为什么TM模型会做出特定的预测(局部解释性)。此外,表达式捕获了整体模型的最重要特征(全局解释性)。我们进一步介绍表达式,以衡量特征值范围对于连续特征的重要性。表达式直接从TM的结合子句中表达,这使得随着模型的发展,也可以在学习过程中捕获特征在学习过程中的作用。此外,从封闭形式的表达式中,我们得出了一种新型的数据聚类算法,用于在三个维度中可视化高维数据。最后,我们将我们提出的方法与Shap和最先进的可解释的机器学习技术进行了比较。对于分类和回归,与XGBoost,可解释的增强机和神经添加剂模型相比,我们的评估显示了与外形以及竞争性预测准确性的对应关系。
Tsetlin Machines (TMs) capture patterns using conjunctive clauses in propositional logic, thus facilitating interpretation. However, recent TM-based approaches mainly rely on inspecting the full range of clauses individually. Such inspection does not necessarily scale to complex prediction problems that require a large number of clauses. In this paper, we propose closed-form expressions for understanding why a TM model makes a specific prediction (local interpretability). Additionally, the expressions capture the most important features of the model overall (global interpretability). We further introduce expressions for measuring the importance of feature value ranges for continuous features. The expressions are formulated directly from the conjunctive clauses of the TM, making it possible to capture the role of features in real-time, also during the learning process as the model evolves. Additionally, from the closed-form expressions, we derive a novel data clustering algorithm for visualizing high-dimensional data in three dimensions. Finally, we compare our proposed approach against SHAP and state-of-the-art interpretable machine learning techniques. For both classification and regression, our evaluation show correspondence with SHAP as well as competitive prediction accuracy in comparison with XGBoost, Explainable Boosting Machines, and Neural Additive Models.