论文标题

Omnifont Persian OCR系统使用原语

Omnifont Persian OCR System Using Primitives

论文作者

Keipour, Azarakhsh, Eshghi, Mohammad, Ghadikolaei, Sina Mohammadzadeh, Mohammadi, Negin, Ensafi, Shahab

论文摘要

在本文中,我们介绍了基于模型的Omnifont Persian OCR系统。该系统使用一组8个原始元素作为识别的结构特征。首先,预处理扫描文档。将预处理图像归一化后,将文本行和子词分开然后变薄。在识别子词中的点后,提取了笔画,并使用中风识别每个子词的原始元素。最后,将原语与预定义的角色识别向量集进行比较,以识别子字。系统的分离和识别步骤是并发的,消除了独立分离字母的不可避免的错误。该系统已在具有6个标准波斯字体的文档上进行了测试。所达到的精度为97.06%。

In this paper, we introduce a model-based omnifont Persian OCR system. The system uses a set of 8 primitive elements as structural features for recognition. First, the scanned document is preprocessed. After normalizing the preprocessed image, text rows and sub-words are separated and then thinned. After recognition of dots in sub-words, strokes are extracted and primitive elements of each sub-word are recognized using the strokes. Finally, the primitives are compared with a predefined set of character identification vectors in order to identify sub-word characters. The separation and recognition steps of the system are concurrent, eliminating unavoidable errors of independent separation of letters. The system has been tested on documents with 14 standard Persian fonts in 6 sizes. The achieved precision is 97.06%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源