探索具有自动化单词入侵者测试的单词嵌入独立组件的解释性

论文标题

探索具有自动化单词入侵者测试的单词嵌入独立组件的解释性

Exploring Interpretability of Independent Components of Word Embeddings with Automated Word Intruder Test

论文作者

Musil, Tomáš, Mareček, David

论文摘要

独立的组件分析（ICA）是一种最初用于在混合信号中查找单独来源的算法，例如同时在同一房间的多人记录。与主成分分析（PCA）不同，ICA允许将单词表示为非结构化特征集，而没有任何特定功能比其他特征更重要。在本文中，我们使用ICA分析单词嵌入。我们发现，ICA可用于查找单词的语义特征，并且可以轻松地组合这些特征以搜索满足组合的单词。我们表明，大多数独立组件代表此类功能。为了量化组件的可解释性，我们使用Intuder测试一词，该测试既由人类和大型语言模型进行。我们建议使用“入侵者”一词的自动化版本作为一种量化向量可解释性的快速而廉价的方式，而无需人工努力。

Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time. Unlike Principal Component Analysis (PCA), ICA permits the representation of a word as an unstructured set of features, without any particular feature being deemed more significant than the others. In this paper, we used ICA to analyze word embeddings. We have found that ICA can be used to find semantic features of the words, and these features can easily be combined to search for words that satisfy the combination. We show that most of the independent components represent such features. To quantify the interpretability of the components, we use the word intruder test, performed both by humans and by large language models. We propose to use the automated version of the word intruder test as a fast and inexpensive way of quantifying vector interpretability without the need for human effort.

下载PDF全文

下载文献需遵守相关版权规定

论文标题