论文标题
SADET:学习有效,准确的行人探测器
SADet: Learning An Efficient and Accurate Pedestrian Detector
论文作者
论文摘要
尽管基于锚的探测器在行人检测方面已经迈出了一大步,但算法的总体性能仍然需要进一步改进实际应用,\ emph {e.g。},这是准确性和效率之间的良好权衡。为此,本文提出了一系列针对单级检测器检测管道的系统优化策略,形成了一个基于射击锚的探测器(SADET),以进行有效,准确的行人检测,其中包括三个主要改进。首先,我们通过将软标签分配给离群样品来优化样本生成过程,以生成具有连续标签值$ 0 $和$ 1 $的半阳性样品,这不仅会产生更多有效的样品,而且还可以增强模型的稳健性。其次,一种新颖的中心损失被应用于边界回归的新回归损失,不仅保留了IOU损失的良好特征,而且还解决了它的一些缺陷。第三,我们还为预测的边界框的后孔设计了余弦,并进一步提出了适应性的锚匹配,以使模型能够根据遮挡程度将锚框与完整或可见的边界盒适应性地匹配,从而使NMS和锚固算法更适合于咬合的算法。尽管在结构上很简单,但它在挑战性的行人检测基准(即Citypersons,Caltech和人类检测基准Crowdhuman)上提供了最先进的结果,VGA分辨率图像($ 640 \ times 480 $)的最先进速度和实时速度为$ 20 $ fps($ 640 \ times 480 $)。
Although the anchor-based detectors have taken a big step forward in pedestrian detection, the overall performance of algorithm still needs further improvement for practical applications, \emph{e.g.}, a good trade-off between the accuracy and efficiency. To this end, this paper proposes a series of systematic optimization strategies for the detection pipeline of one-stage detector, forming a single shot anchor-based detector (SADet) for efficient and accurate pedestrian detection, which includes three main improvements. Firstly, we optimize the sample generation process by assigning soft tags to the outlier samples to generate semi-positive samples with continuous tag value between $0$ and $1$, which not only produces more valid samples, but also strengthens the robustness of the model. Secondly, a novel Center-$IoU$ loss is applied as a new regression loss for bounding box regression, which not only retains the good characteristics of IoU loss, but also solves some defects of it. Thirdly, we also design Cosine-NMS for the postprocess of predicted bounding boxes, and further propose adaptive anchor matching to enable the model to adaptively match the anchor boxes to full or visible bounding boxes according to the degree of occlusion, making the NMS and anchor matching algorithms more suitable for occluded pedestrian detection. Though structurally simple, it presents state-of-the-art result and real-time speed of $20$ FPS for VGA-resolution images ($640 \times 480$) on challenging pedestrian detection benchmarks, i.e., CityPersons, Caltech, and human detection benchmark CrowdHuman, leading to a new attractive pedestrian detector.