BERT2DNN：带有大量未标记数据的BERT蒸馏用于在线电子商务搜索

论文标题

BERT2DNN：带有大量未标记数据的BERT蒸馏用于在线电子商务搜索

BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search

论文作者

Jiang, Yunjiang, Shang, Yue, Liu, Ziyang, Shen, Hongwei, Xiao, Yun, Xiong, Wei, Xu, Sulong, Yan, Weipeng, Jin, Di

论文摘要

相关性对电子商务搜索平台的用户体验和业务利润有重大影响。在这项工作中，我们通过将知识从BERT和相关的多层变压器教师模型中提取到具有大量未标记数据的简单馈送网络，提出了一个数据驱动的框架，以进行搜索相关性预测。蒸馏过程产生了一个学生模型，该模型在新查询上恢复了教师模型的97 \％测试准确性，其服务成本低几级（比Bert-Base低150倍，比Bert-Base低150倍，比最有效的BERT变体低15倍，Tinybert）。温度恢复和教师模型的应用进一步提高了模型的准确性，而不会增加学生模型的复杂性。我们介绍了内部电子商务搜索相关性数据的实验结果，以及胶水基准的情感分析的公共数据设置。后者利用了另一个相关的公共数据集，该数据集的规模更大，同时忽略了其潜在的嘈杂标签。嵌入内部数据的分析和案例研究进一步突出了所得模型的强度。通过将数据处理和模型培训源代码公开，我们希望这里提出的技术可以帮助减少最先进的变压器模型的能源消耗，并为无法访问尖端机器学习硬件的小型组织提供竞争环境。

Relevance has significant impact on user experience and business profit for e-commerce search platform. In this work, we propose a data-driven framework for search relevance prediction, by distilling knowledge from BERT and related multi-layer Transformer teacher models into simple feed-forward networks with large amount of unlabeled data. The distillation process produces a student model that recovers more than 97\% test accuracy of teacher models on new queries, at a serving cost that's several magnitude lower (latency 150x lower than BERT-Base and 15x lower than the most efficient BERT variant, TinyBERT). The applications of temperature rescaling and teacher model stacking further boost model accuracy, without increasing the student model complexity. We present experimental results on both in-house e-commerce search relevance data as well as a public data set on sentiment analysis from the GLUE benchmark. The latter takes advantage of another related public data set of much larger scale, while disregarding its potentially noisy labels. Embedding analysis and case study on the in-house data further highlight the strength of the resulting model. By making the data processing and model training source code public, we hope the techniques presented here can help reduce energy consumption of the state of the art Transformer models and also level the playing field for small organizations lacking access to cutting edge machine learning hardwares.

下载PDF全文

下载文献需遵守相关版权规定

论文标题