论文标题
网络搜索引擎中的局部结果缓存
Topical Result Caching in Web Search Engines
论文作者
论文摘要
缓存搜索结果用于信息检索系统中,以加快查询处理并减少后端服务器工作负载。通过观察到属于不同主题的查询具有不同的时间本地性模式的观察,我们研究了一种名为STD(静态型动态缓存)的新型缓存模型。它改善了传统的SDC(静态缓存),该SDC(静态缓存)存储在静态缓存中的流行查询结果并通过替换策略来管理动态缓存,以拦截查询流中的时间变化。我们提出的缓存计划包括另一层用于基于主题的缓存,其中条目分配给了不同的主题(例如天气,教育)。以某个主题为特征的查询结果保存在专用于其的缓存的一部分中。这允许将缓存空间利用率调整到各种主题的时间位置,并减少由于那些尚未足够流行的查询而在静态部分中不受欢迎的问题,也可以在短时间间隔内请求以使其在动态部分中。我们使用两个现实世界查询流对STD进行了不同的配置。实验表明,我们的方法在命中率方面高达3%,高达36%的GAP降低W.R.T.的表现优于SDC。理论最佳缓存算法的SDC。
Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream. Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion. We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.