论文标题

视频检索的模态均衡嵌入

Modality-Balanced Embedding for Video Retrieval

论文作者

Wang, Xun, Ke, Bingqing, Li, Xuanping, Liu, Fangyu, Zhang, Mingyu, Liang, Xiao, Xiao, Qiushi, Luo, Cheng, Yu, Yue

论文摘要

视频搜索已成为用户发现与大型短视频共享平台上文本查询相关的视频的主要例程。在培训使用在线搜索日志的查询视频编码模型时,我们确定了一种模式偏见现象,该现象几乎完全依赖文本匹配,从而忽略了视频的其他方式,例如视觉,音频。从a)a)模态差距的这种模态不断发展:查询和视频文本之间的相关性更容易学习,因为查询也是文本,具有与视频文本相同的模态。 b)数据偏见:大多数培训样本只能通过文本匹配来解决。在这里,我们分享我们的实践,以改善第一个检索阶段,包括解决方式不平衡问题的解决方案。我们提出了MBVR(用于模态平衡视频检索的缩写),其中有两个关键组件:基于视觉相关性的手动生成的模态示例(MS)样本(MS)样本(MS)样本(DM)。他们可以鼓励视频编码器对每种方式都保持平衡的关注。通过对现实世界数据集的广泛实验,我们从经验上表明,我们的方法在解决模式偏见问题方面既有效又有效。我们还将MBVR部署在一个大型视频平台中,并在A/B测试和手动GSB评估中观察到高度优化的基线具有统计学意义的提升。

Video search has become the main routine for users to discover videos relevant to a text query on large short-video sharing platforms. During training a query-video bi-encoder model using online search logs, we identify a modality bias phenomenon that the video encoder almost entirely relies on text matching, neglecting other modalities of the videos such as vision, audio. This modality imbalanceresults from a) modality gap: the relevance between a query and a video text is much easier to learn as the query is also a piece of text, with the same modality as the video text; b) data bias: most training samples can be solved solely by text matching. Here we share our practices to improve the first retrieval stage including our solution for the modality imbalance issue. We propose MBVR (short for Modality Balanced Video Retrieval) with two key components: manually generated modality-shuffled (MS) samples and a dynamic margin (DM) based on visual relevance. They can encourage the video encoder to pay balanced attentions to each modality. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving modality bias problem. We have also deployed our MBVR in a large video platform and observed statistically significant boost over a highly optimized baseline in an A/B test and manual GSB evaluations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源