论文标题
近乎简化的视频检测,具有耦合的时间和感知视觉结构以及基于逻辑推理的匹配
Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching
论文作者
论文摘要
我们在本文中提出了一个基于以下几个近乎缩写视频检测的体系结构:(i)基于索引和查询签名的结构集成了时间和感知视觉特征,以及(ii)匹配的框架计算索引和查询文档之间的逻辑推断。就索引而言,我们没有在高维空间中串联低级视觉特征,从而导致维度和冗余问题的诅咒,而是采用基于颜色和纹理概念的感知符号表示。对于匹配,我们建议通过逻辑推断实例化检索模型,通过n-gram滑动窗口过程和理论上基于晶格的结构的耦合。我们涵盖的技术对一般视频编辑和/或退化不敏感,因此非常适合重新播放视频搜索。实验是根据从TRECVID 02、03和04集合以及从两个德国电视台录制的真实视频广播收集的大量视频数据进行的。对两种最先进的动态编程技术的经验比较令人鼓舞,并证明了我们方法的优势和可行性。
We propose in this paper an architecture for near-duplicate video detection based on: (i) index and query signature based structures integrating temporal and perceptual visual features and (ii) a matching framework computing the logical inference between index and query documents. As far as indexing is concerned, instead of concatenating low-level visual features in high-dimensional spaces which results in curse of dimensionality and redundancy issues, we adopt a perceptual symbolic representation based on color and texture concepts. For matching, we propose to instantiate a retrieval model based on logical inference through the coupling of an N-gram sliding window process and theoretically-sound lattice-based structures. The techniques we cover are robust and insensitive to general video editing and/or degradation, making it ideal for re-broadcasted video search. Experiments are carried out on large quantities of video data collected from the TRECVID 02, 03 and 04 collections and real-world video broadcasts recorded from two German TV stations. An empirical comparison over two state-of-the-art dynamic programming techniques is encouraging and demonstrates the advantage and feasibility of our method.