锚和变换：学习大词汇的稀疏嵌入

论文标题

锚和变换：学习大词汇的稀疏嵌入

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

论文作者

Liang, Paul Pu, Zaheer, Manzil, Wang, Yuan, Ahmed, Amr

论文摘要

学习文本，用户，电影和URL等离散对象的连续表示是许多应用程序的核心，包括语言和用户建模。当使用离散对象作为神经网络的输入时，我们通常会忽略基本结构（例如自然组和相似性），而将对象独立嵌入到单个向量中。结果，现有方法不能扩展到大词汇大小。在本文中，我们设计了一种简单有效的嵌入算法，该算法学习了一小部分锚固嵌入和稀疏的转换矩阵。我们将方法锚定和变换（ANT）称为离散对象的嵌入是锚固的稀疏线性组合，根据转换矩阵加权。蚂蚁是可扩展的，灵活的，并且端到端可训练。我们进一步提供了对算法作为贝叶斯非参数的统计解释，用于嵌入，鼓励对象之间的稀疏性和利用自然组。通过基于较小的方差渐近学得出近似推理算法，我们获得了自然的扩展，该扩展可以自动学习最佳的锚固数量，而不必将其作为高参数调整。在文本分类，语言建模和电影推荐基准上，我们表明蚂蚁特别适合大型词汇量，并且与现有的压缩基线相比，具有更少的参数（高达40倍压缩）的性能更强（最高40倍）。

Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing methods do not scale to large vocabulary sizes. In this paper, we design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix. We call our method Anchor & Transform (ANT) as the embeddings of discrete objects are a sparse linear combination of the anchors, weighted according to the transformation matrix. ANT is scalable, flexible, and end-to-end trainable. We further provide a statistical interpretation of our algorithm as a Bayesian nonparametric prior for embeddings that encourages sparsity and leverages natural groupings among objects. By deriving an approximate inference algorithm based on Small Variance Asymptotics, we obtain a natural extension that automatically learns the optimal number of anchors instead of having to tune it as a hyperparameter. On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes and demonstrates stronger performance with fewer parameters (up to 40x compression) as compared to existing compression baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题