论文标题
Horae:小时书的注释数据集
HORAE: an annotated dataset of books of hours
论文作者
论文摘要
我们在本文中介绍了一个新的注释页面的新数据集,该页面是小时书,一种手写的祈祷书,由中世纪后期有钱人拥有和使用。该数据集的创建是为了在此期间对欧洲宗教思维的演变进行历史研究,因为《小时书》代表了其丰富的插图和所包含的不同类型的宗教来源的主要信息来源之一。我们首先描述了如何收集和手动注释语料库,然后介绍对文本线检测以及区域检测和键入的最新系统的评估。该语料库可自由使用。
We introduce in this paper a new dataset of annotated pages from books of hours, a type of handwritten prayer books owned and used by rich lay people in the late middle ages. The dataset was created for conducting historical research on the evolution of the religious mindset in Europe at this period since the book of hours represent one of the major sources of information thanks both to their rich illustrations and the different types of religious sources they contain. We first describe how the corpus was collected and manually annotated then present the evaluation of a state-of-the-art system for text line detection and for zone detection and typing. The corpus is freely available for research.