论文标题
PCPL:无偏见的场景图生成的谓词相关感知学习
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation
论文作者
论文摘要
如今,场景图生成(SGG)任务在很大程度上受到现实情况的限制,这主要是由于谓词注释分布的尾巴偏差极为长。因此,解决SGG的阶级失衡问题是至关重要的和具有挑战性的。在本文中,我们首先发现,当谓词标签彼此之间存在很强的相关性时,普遍的重新平衡策略(例如,重新采样和重新降低和重新加权)会导致过度适合尾部数据(例如,坐在人行道上,而不是在人行道上而不是在上面),或者仍然会遭受原始的不良效果。我们认为主要原因是,重新平衡策略对谓词但对其相关性视而不见的频率敏感,这可能在促进学习谓词特征的学习方面起着更为重要的作用。因此,我们提出了一种新颖的谓词 - 相关感知学习(用于简短的PCPL),以通过直接感知和利用谓词类别之间的相关性来适应适当的损失权重。此外,我们的PCPL框架进一步配备了图形编码器模块,以更好地提取上下文功能。基准VG150数据集的广泛实验表明,所提出的PCPL在尾部类别上的表现明显更好,同时可以很好地证明头部的性能,这极大地胜过了先前的最新方法。
Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution. Thus, tackling the class imbalance trouble of SGG is critical and challenging. In this paper, we first discover that when predicate labels have strong correlation with each other, prevalent re-balancing strategies(e.g., re-sampling and re-weighting) will give rise to either over-fitting the tail data(e.g., bench sitting on sidewalk rather than on), or still suffering the adverse effect from the original uneven distribution(e.g., aggregating varied parked on/standing on/sitting on into on). We argue the principal reason is that re-balancing strategies are sensitive to the frequencies of predicates yet blind to their relatedness, which may play a more important role to promote the learning of predicate features. Therefore, we propose a novel Predicate-Correlation Perception Learning(PCPL for short) scheme to adaptively seek out appropriate loss weights by directly perceiving and utilizing the correlation among predicate classes. Moreover, our PCPL framework is further equipped with a graph encoder module to better extract context features. Extensive experiments on the benchmark VG150 dataset show that the proposed PCPL performs markedly better on tail classes while well-preserving the performance on head ones, which significantly outperforms previous state-of-the-art methods.