论文标题
图片:识别长期活动的排列不变卷
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities
论文作者
论文摘要
神经操作作为卷积,自我注意力和矢量聚集是识别短期行动的首选选择。但是,它们在建模远程活动时有三个限制。本文介绍了PIC,置换不变卷积,这是一种新型神经层,用于建模远程活动的时间结构。它具有三个理想的属性。我。与标准卷积不同,PIC对于其接受场内特征的时间排列不变,使其有资格对弱的时间结构进行建模。 ii。与向量聚集不同,PIC尊重局部连接,使其能够使用级联层学习长期时间抽象。 iii。与自我注意力相比,PIC使用共同的权重,使其更有能力检测到长长和嘈杂的视频中最判别的视觉证据。我们研究了PIC的三个特性,并证明了其在识别Charades,Breakfast和Multinumos的远程活动方面的有效性。
Neural operations as convolutions, self-attention, and vector aggregation are the go-to choices for recognizing short-range actions. However, they have three limitations in modeling long-range activities. This paper presents PIC, Permutation Invariant Convolution, a novel neural layer to model the temporal structure of long-range activities. It has three desirable properties. i. Unlike standard convolution, PIC is invariant to the temporal permutations of features within its receptive field, qualifying it to model the weak temporal structures. ii. Different from vector aggregation, PIC respects local connectivity, enabling it to learn long-range temporal abstractions using cascaded layers. iii. In contrast to self-attention, PIC uses shared weights, making it more capable of detecting the most discriminant visual evidence across long and noisy videos. We study the three properties of PIC and demonstrate its effectiveness in recognizing the long-range activities of Charades, Breakfast, and MultiThumos.