论文标题
Lasot:高质量的大型单一对象跟踪基准测试
LaSOT: A High-quality Large-scale Single Object Tracking Benchmark
论文作者
论文摘要
尽管最近在视觉跟踪方面取得了巨大进展,但由于缺乏专用的大规模基准,其进一步的开发(包括算法设计和评估)受到限制。为了解决此问题,我们提出了Lasot,这是一种高质量的大规模单一对象跟踪基准。 Lasot包含85个对象类别的各种选择,并提供1,550个总计超过387万帧。每个视频框架都用一个边界框仔细,手动注释。据我们所知,这是最大的注释跟踪基准。我们发布LASOT的目标是为追踪器的培训和评估提供专门的高质量平台。拉索特的平均视频长度约为2500帧,每个视频都包含现实世界录像中存在的各种挑战因素,例如目标消失和重新出现。这些较长的视频长度允许评估长期跟踪器。为了利用视觉外观和自然语言之间的紧密联系,我们为拉索特中的每个视频提供语言规范。我们认为,这种增加将使将来的研究使用语言功能来改善跟踪。指定了两个协议,即全面封闭和单次弹奏,以灵活评估跟踪器。我们通过深入分析对LASOT上的48个基线跟踪器进行了广泛的评估,结果表明仍然存在重大改进的空间。完整的基准测试,跟踪结果以及分析可在http://vision.cs.stonybrook.edu/~lasot/上获得。
Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1,550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2,500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols, full-overlap and one-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at http://vision.cs.stonybrook.edu/~lasot/.