论文标题
基于视频的人重新识别的象征性时间池化方法
A Symbolic Temporal Pooling method for Video-based Person Re-Identification
论文作者
论文摘要
在基于视频的人的重新识别中,已知空间和时间特征都为有效表示提供正交线索。当前,此类表示通常是通过使用Max/AVG池在模型的不同点汇总帧级特征来获得的。但是,此类操作还减少了可用信息的量,如果不同类别之间的可分离性差,这尤其有害。为了减轻这个问题,本文引入了一种符号时间池合并方法,其中帧级特征以分布有价值的符号形式表示,从将经验累积分布函数(ECDF)拟合到每个功能中。同样,考虑到原始的三重态损耗公式不能直接应用于这种表示形式,我们引入了符号三重损失函数,该函数呈现出两个符号对象之间的相似性。在针对最先进的最新数据集(MARS,ILIDS-VID,PRID2011和PESTRE)中对提出的解决方案进行了广泛的经验评估,观察到的结果点是,在先前最佳性能技术方面,性能持续改善。
In video-based person re-identification, both the spatial and temporal features are known to provide orthogonal cues to effective representations. Such representations are currently typically obtained by aggregating the frame-level features using max/avg pooling, at different points of the models. However, such operations also decrease the amount of discriminating information available, which is particularly hazardous in case of poor separability between the different classes. To alleviate this problem, this paper introduces a symbolic temporal pooling method, where frame-level features are represented in the distribution valued symbolic form, yielding from fitting an Empirical Cumulative Distribution Function (ECDF) to each feature. Also, considering that the original triplet loss formulation cannot be applied directly to this kind of representations, we introduce a symbolic triplet loss function that infers the similarity between two symbolic objects. Having carried out an extensive empirical evaluation of the proposed solution against the state-of-the-art, in four well known data sets (MARS, iLIDS-VID, PRID2011 and P-DESTRE), the observed results point for consistent improvements in performance over the previous best performing techniques.