论文标题
Omnifont Persian OCR系统使用原语
Omnifont Persian OCR System Using Primitives
论文作者
论文摘要
在本文中,我们介绍了基于模型的Omnifont Persian OCR系统。该系统使用一组8个原始元素作为识别的结构特征。首先,预处理扫描文档。将预处理图像归一化后,将文本行和子词分开然后变薄。在识别子词中的点后,提取了笔画,并使用中风识别每个子词的原始元素。最后,将原语与预定义的角色识别向量集进行比较,以识别子字。系统的分离和识别步骤是并发的,消除了独立分离字母的不可避免的错误。该系统已在具有6个标准波斯字体的文档上进行了测试。所达到的精度为97.06%。
In this paper, we introduce a model-based omnifont Persian OCR system. The system uses a set of 8 primitive elements as structural features for recognition. First, the scanned document is preprocessed. After normalizing the preprocessed image, text rows and sub-words are separated and then thinned. After recognition of dots in sub-words, strokes are extracted and primitive elements of each sub-word are recognized using the strokes. Finally, the primitives are compared with a predefined set of character identification vectors in order to identify sub-word characters. The separation and recognition steps of the system are concurrent, eliminating unavoidable errors of independent separation of letters. The system has been tested on documents with 14 standard Persian fonts in 6 sizes. The achieved precision is 97.06%.