学习用贴纸做出响应：在多转化对话框中统一多模式的框架

论文标题

学习用贴纸做出响应：在多转化对话框中统一多模式的框架

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

论文作者

Gao, Shen, Chen, Xiuying, Liu, Chang, Liu, Li, Zhao, Dongyan, Yan, Rui

论文摘要

具有生动和引人入胜的表达式的贴纸在在线消息传递应用中越来越流行，有些作品专门通过将贴纸的文本标签与以前的话语匹配，从而自动选择贴纸响应。但是，由于它们的数量很大，需要所有贴纸的文本标签是不切实际的。因此，在本文中，我们建议基于多转向对话框的上下文历史记录为用户提供适当的贴纸，而没有任何外部标签。这项任务面临两个主要挑战。一种是学习贴纸的语义含义，而无需相应的文本标签。另一个挑战是通过多转话对话环境共同对候选标签进行建模。为了应对这些挑战，我们提出了贴纸响应选择器（SRS）模型。具体而言，SRS首先采用基于卷积的贴纸图像编码器和基于自我注意的多转话对话框编码器来获得贴纸和话语的表示。接下来，提出了深层互动网络，以在贴纸之间与对话历史记录中的每个话语进行深度匹配。然后，SRS通过融合网络学习所有相互作用结果之间的短期和长期依赖性，以输出最终匹配分数。为了评估我们提出的方法，我们收集了一个大规模的真实对话框数据集，该数据集带有来自最受欢迎的在线聊天平台之一的贴纸。在此数据集上进行的广泛实验表明，我们的模型可实现所有常用指标的最新性能。实验还验证了SRS每个组件的有效性。为了促进贴纸选择领域的进一步研究，我们发布了340k多转对话框和贴纸对的数据集。

Stickers with vivid and engaging expressions are becoming increasingly popular in online messaging apps, and some works are dedicated to automatically select sticker response by matching text labels of stickers with previous utterances. However, due to their large quantities, it is impractical to require text labels for the all stickers. Hence, in this paper, we propose to recommend an appropriate sticker to user based on multi-turn dialog context history without any external labels. Two main challenges are confronted in this task. One is to learn semantic meaning of stickers without corresponding text labels. Another challenge is to jointly model the candidate sticker with the multi-turn dialog context. To tackle these challenges, we propose a sticker response selector (SRS) model. Specifically, SRS first employs a convolutional based sticker image encoder and a self-attention based multi-turn dialog encoder to obtain the representation of stickers and utterances. Next, deep interaction network is proposed to conduct deep matching between the sticker with each utterance in the dialog history. SRS then learns the short-term and long-term dependency between all interaction results by a fusion network to output the the final matching score. To evaluate our proposed method, we collect a large-scale real-world dialog dataset with stickers from one of the most popular online chatting platform. Extensive experiments conducted on this dataset show that our model achieves the state-of-the-art performance for all commonly-used metrics. Experiments also verify the effectiveness of each component of SRS. To facilitate further research in sticker selection field, we release this dataset of 340K multi-turn dialog and sticker pairs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题