论文标题
历史上的手稿识别的几种学习方法
A Few-shot Learning Approach for Historical Ciphered Manuscript Recognition
论文作者
论文摘要
编码(或密码)手稿是包含加密文本的特殊类型。这种文档的自动识别是具有挑战性的,因为:1)密码字母从一个文档变为另一个文档,2)缺乏带注释的语料库进行培训和3)触摸符号使符号分割变得困难且复杂。为了克服这些困难,我们提出了一种基于少量对象检测的手写密码识别的新方法。我们的方法首先检测到行图中给定字母的所有符号,然后解码步骤将符号相似性得分映射到转录符号的最终顺序。通过培训合成数据,我们表明所提出的体系结构能够识别具有看不见的字母的手写密码。此外,如果使用相同字母的标记页面很少用于微调,我们的方法将超过现有的无监督和监督的HTR方法,用于密码识别。
Encoded (or ciphered) manuscripts are a special type of historical documents that contain encrypted text. The automatic recognition of this kind of documents is challenging because: 1) the cipher alphabet changes from one document to another, 2) there is a lack of annotated corpus for training and 3) touching symbols make the symbol segmentation difficult and complex. To overcome these difficulties, we propose a novel method for handwritten ciphers recognition based on few-shot object detection. Our method first detects all symbols of a given alphabet in a line image, and then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols. By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets. In addition, if few labeled pages with the same alphabet are used for fine tuning, our method surpasses existing unsupervised and supervised HTR methods for ciphers recognition.