论文标题
识别文本中隐性地理运动
Recognition of Implicit Geographic Movement in Text
论文作者
论文摘要
分析人类,动物和其他现象的地理运动是一个越来越多的研究领域。这项研究使城市规划,物流,动物迁移理解等等受益。通常,该运动被捕获为具有全球定位系统(GPS)的精确地理坐标和时间戳。尽管一些研究使用计算技术来利用路线方向,远足路径和历史探索路线的描述中的隐式运动,但创新将随着庞大而多样化的语料库加速。我们创建了一个标记为描述地理运动的句子语料库,包括实体移动的类型。事实证明,如果没有任何可比的语料库,就很难创建这个语料库,而人的标签成本很高,并且有时可以对运动进行不同的解释。为了克服这些挑战,我们采用手动标签,人群投票确认以及机器学习来预测更多标签,开发了一个迭代过程。通过将单词嵌入的进步与传统的机器学习模型和模型结合在一起,尽管小型金标准语料库训练套件,但预测准确性是可以接受的,可以在可接受的水平上产生大型的银色标准语料库。除了检测运动外,我们的语料库可能会受益于文本和空间认知中地理的计算处理。
Analyzing the geographic movement of humans, animals, and other phenomena is a growing field of research. This research has benefited urban planning, logistics, animal migration understanding, and much more. Typically, the movement is captured as precise geographic coordinates and time stamps with Global Positioning Systems (GPS). Although some research uses computational techniques to take advantage of implicit movement in descriptions of route directions, hiking paths, and historical exploration routes, innovation would accelerate with a large and diverse corpus. We created a corpus of sentences labeled as describing geographic movement or not and including the type of entity moving. Creating this corpus proved difficult without any comparable corpora to start with, high human labeling costs, and since movement can at times be interpreted differently. To overcome these challenges, we developed an iterative process employing hand labeling, crowd voting for confirmation, and machine learning to predict more labels. By merging advances in word embeddings with traditional machine learning models and model ensembling, prediction accuracy is at an acceptable level to produce a large silver-standard corpus despite the small gold-standard corpus training set. Our corpus will likely benefit computational processing of geography in text and spatial cognition, in addition to detection of movement.