在推荐系统中学习多个量化的大录像带分类特征的多个量化嵌入

论文标题

在推荐系统中学习多个量化的大录像带分类特征的多个量化嵌入

Learning Multi-granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems

论文作者

Kang, Wang-Cheng, Cheng, Derek Zhiyuan, Chen, Ting, Yi, Xinyang, Lin, Dong, Hong, Lichan, Chi, Ed H.

论文摘要

推荐系统模型通常代表通过嵌入的用户，项目和分类功能等各种稀疏功能。一种标准方法是将每个唯一特征值映射到嵌入向量。所产生的嵌入桌的大小随词汇的大小线性增长。因此，大型词汇不可避免地会导致一个巨大的嵌入桌子，从而造成了两个严重的问题：（i）在资源受限的环境中使模型可用；（ii）引起过度拟合问题。在本文中，我们试图学习推荐系统（RECSYS）中大录像带稀疏特征的高度紧凑嵌入。首先，我们表明新型可区分产品量化（DPQ）方法可以推广到recsys问题。此外，为了更好地处理Recsys常见的幂律数据分布，我们提出了一种多粒子量化嵌入（MGQE）技术，该技术可以学习更多无常见项目的紧凑嵌入。我们试图提供一个新的角度来通过紧凑的模型大小来提高建议性能。对三个建议任务和两个数据集进行了广泛的实验表明，我们可以在标准或更好的性能上实现，只有约20％的原始型号大小。

Recommender system models often represent various sparse features like users, items, and categorical features via embeddings. A standard approach is to map each unique feature value to an embedding vector. The size of the produced embedding table grows linearly with the size of the vocabulary. Therefore, a large vocabulary inevitably leads to a gigantic embedding table, creating two severe problems: (i) making model serving intractable in resource-constrained environments; (ii) causing overfitting problems. In this paper, we seek to learn highly compact embeddings for large-vocab sparse features in recommender systems (recsys). First, we show that the novel Differentiable Product Quantization (DPQ) approach can generalize to recsys problems. In addition, to better handle the power-law data distribution commonly seen in recsys, we propose a Multi-Granular Quantized Embeddings (MGQE) technique which learns more compact embeddings for infrequent items. We seek to provide a new angle to improve recommendation performance with compact model sizes. Extensive experiments on three recommendation tasks and two datasets show that we can achieve on par or better performance, with only ~20% of the original model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题