论文标题
Erasenet:用于监督文档清洁的经常性残留网络
EraseNet: A Recurrent Residual Network for Supervised Document Cleaning
论文作者
论文摘要
文档DeNoising被认为是计算机视觉中最具挑战性的任务之一。存在数百万个文档仍有要数字化的文件,但是由于自然和人造因素而导致的文件退化等问题使此任务非常困难。本文介绍了一种使用新的完全卷积自动编码器架构清洁脏文档的监督方法。本文着重于恢复文档,该文档具有差异,例如由于文档的老化而导致的畸形,在Xeroxexed的页面上留下的折痕,随机的黑色斑块,略微可见的文本等,并提高了图像质量以获得更好的光学角色识别系统(OCR)的性能。从扫描文档中删除噪音是在文档之前的非常重要的一步,因为这种噪声会严重影响OCR系统的性能。本文中的实验显示了令人鼓舞的结果,因为该模型能够学习各种普通和异常的噪声并有效地纠正它们。
Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.