分散：选择性上下文注意场景文本识别器

论文标题

分散：选择性上下文注意场景文本识别器

SCATTER: Selective Context Attentional Scene Text Recognizer

论文作者

Litman, Ron, Anschel, Oron, Tsiper, Shahar, Litman, Roee, Mazor, Shai, Manmatha, R.

论文摘要

场景文本识别（STR）是针对复杂图像背景识别文本的任务，是一个活跃的研究领域。当前的最新方法（SOTA）方法仍然难以识别以任意形状编写的文本。在本文中，我们介绍了一种针对STR的新颖体系结构，称为选择性上下文注意文本识别器（STACT）。 Scatter在训练过程中利用具有中间监督的堆叠块体系结构，这为成功训练深度Bilstm编码器的方式铺平了道路，从而改善了上下文依赖性的编码。使用两步的1D注意机制进行解码。第一个注意步骤从CNN主链重新稳定视觉特征，以及由Bilstm层计算的上下文特征。与以前的论文类似的第二个关注步骤将特征视为序列，并参与了序列之间的关系。实验表明，所提出的方法平均超过了不规则文本识别基准的SOTA性能。

Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Decoding is done using a two-step 1D attention mechanism. The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer. The second attention step, similar to previous papers, treats the features as a sequence and attends to the intra-sequence relationships. Experiments show that the proposed approach surpasses SOTA performance on irregular text recognition benchmarks by 3.7\% on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题