论文标题

近距离全文搜索经常发生的单词具有响应时间保证

Proximity full-text searches of frequently occurring words with a response time guarantee

论文作者

Veretennikov, Alexander B.

论文摘要

全文搜索引擎是信息检索的重要工具。在接近全文搜索中,如果文档彼此相近包含查询项,则是相关的,尤其是当查询项经常出现单词时。对于文本中的每个单词,我们使用其他索引来存储与给定单词小于或等于MaxDistance的距离的附近单词的信息,这是一个参数。讨论查询由高频使用的单词组成的情况的搜索算法。此外,我们提出了具有不同最大值值的实验的结果,以评估搜索速度对最大值的依赖。这些结果表明,我们的索引的平均查询执行时间是94.7-45.9次(取决于MaxDistance的值),而当评估包含高频出现单词的查询时,使用标准倒置文件小于标准倒置文件。 这是Pinelas S.,Kim A.,Vlasov V.(Eds)数学分析和应用中发表的贡献的预印。 Concord-90。2018年。《数学与统计》的Springer会议记录,第318卷,由Cham Springer出版。最终身份验证的版本可在线获得:https://doi.org/10.1007/978-3-3-030-42176-2_37

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less than or equal to MaxDistance, which is a parameter. A search algorithm for the case when the query consists of high-frequently used words is discussed. In addition, we present results of experiments with different values of MaxDistance to evaluate the search speed dependence on the value of MaxDistance. These results show that the average time of the query execution with our indexes is 94.7-45.9 times (depending on the value of MaxDistance) less than that with standard inverted files when queries that contain high-frequently occurring words are evaluated. This is a pre-print of a contribution published in Pinelas S., Kim A., Vlasov V. (eds) Mathematical Analysis With Applications. CONCORD-90 2018. Springer Proceedings in Mathematics & Statistics, vol 318, published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-42176-2_37

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源