论文标题
对科学图的对象检测网络的系统评估
A Systematic Evaluation of Object Detection Networks for Scientific Plots
论文作者
论文摘要
现有的对象检测方法是否足以检测科学图中的文本和视觉元素,而科学图与自然图像中的对象可以不同?要回答这个问题,我们训练并比较了PlotQA数据集上各种SOTA对象检测网络的准确性。在0.5的标准IOU设置下,大多数网络在检测图中相对简单的对象时,地图得分大于80%。但是,当以0.9的更严格的IOU评估,最佳模型的地图为35.70%时,性能会大幅下降。请注意,在处理科学图时,这种更严格的评估至关重要,即使是较小的本地化错误也会导致下游数值推断中的较大错误。鉴于这种表现不佳,我们通过结合来自不同对象检测网络的想法来对现有模型进行微小修改。尽管这显着提高了性能,但仍有两个主要问题:(i)对于推理至关重要的文本对象的性能非常差,并且(ii)考虑到图的简单性,推理时间非常大。为了解决这个空旷的问题,我们做出了一系列贡献:(a)一种基于拉普拉斯边缘探测器的高效区域建议方法,(b)包括附近信息的区域建议的特征表示,(c)连接一个链接组件,以连接多个区域建议,用于检测更长的文本对象,以及(d)与平稳的损失相结合的l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 l1 lou i。结合这些想法,我们的最终模型非常准确,在极端的IOU值中获得了93.44%@0.9 iou的地图。同时,我们的模型非常有效,推理时间比当前模型(包括一阶段检测器)要小16倍。有了这些贡献,我们就可以进一步探索地块的自动推理。
Are existing object detection methods adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images? To answer this question, we train and compare the accuracy of various SOTA object detection networks on the PlotQA dataset. At the standard IOU setting of 0.5, most networks perform well with mAP scores greater than 80% in detecting the relatively simple objects in plots. However, the performance drops drastically when evaluated at a stricter IOU of 0.9 with the best model giving a mAP of 35.70%. Note that such a stricter evaluation is essential when dealing with scientific plots where even minor localisation errors can lead to large errors in downstream numerical inferences. Given this poor performance, we propose minor modifications to existing models by combining ideas from different object detection networks. While this significantly improves the performance, there are still 2 main issues: (i) performance on text objects which are essential for reasoning is very poor, and (ii) inference time is unacceptably large considering the simplicity of plots. To solve this open problem, we make a series of contributions: (a) an efficient region proposal method based on Laplacian edge detectors, (b) a feature representation of region proposals that includes neighbouring information, (c) a linking component to join multiple region proposals for detecting longer textual objects, and (d) a custom loss function that combines a smooth L1-loss with an IOU-based loss. Combining these ideas, our final model is very accurate at extreme IOU values achieving a mAP of 93.44%@0.9 IOU. Simultaneously, our model is very efficient with an inference time 16x lesser than the current models, including one-stage detectors. With these contributions, we enable further exploration on the automated reasoning of plots.