论文标题
一个简单而有效的积极学习框架
A Simple yet Effective Framework for Active Learning to Rank
论文作者
论文摘要
尽管中国已成为全球最大的在线市场,约有10亿个互联网用户,但Baidu运营着全球最大的中国搜索引擎,可为每日活跃的用户提供数亿以上,每天响应数十亿美元的查询。为了处理Web规模的用户的各种查询请求,百度在理解用户的查询,从数万亿个网页中检索相关内容并在结果中排名最相关的网页方面做出了巨大的努力。在百度搜索中使用的这些组件中,学习排名(LTR)起着至关重要的作用,我们需要及时标记大量的查询以及相关网页,以培训和更新在线LTR模型。为了减少查询/网页标签的成本和时间消耗,我们研究了活动学习的问题(Active LTR),该问题选择了未标记的查询以进行注释和培训。具体而言,我们首先调查标准 - 使用查询bc委员会(QBC)方法的一系列在线LTR模型(QBC)方法更新的在线LTR模型产生的查询中的相关网页的熵(RE)。然后,我们探索一个新标准,即预测方差(PV),该方差衡量查询下所有相关网页的预测结果的差异。我们的经验研究发现,RE可能会从池中偏爱低频查询来标记,同时PV优先考虑高频查询。最后,我们将这两个互补标准与主动学习的样本选择策略相结合。与基线算法进行比较的广泛实验表明,所提出的方法可以训练LTR模型,从而获得更高折扣的累积增益(即相对改进ΔDCG4= 1.38%),并具有相同的预算标签工作。
While China has become the biggest online market in the world with around 1 billion internet users, Baidu runs the world largest Chinese search engine serving more than hundreds of millions of daily active users and responding billions queries per day. To handle the diverse query requests from users at web-scale, Baidu has done tremendous efforts in understanding users' queries, retrieve relevant contents from a pool of trillions of webpages, and rank the most relevant webpages on the top of results. Among these components used in Baidu search, learning to rank (LTR) plays a critical role and we need to timely label an extremely large number of queries together with relevant webpages to train and update the online LTR models. To reduce the costs and time consumption of queries/webpages labeling, we study the problem of Activ Learning to Rank (active LTR) that selects unlabeled queries for annotation and training in this work. Specifically, we first investigate the criterion -- Ranking Entropy (RE) characterizing the entropy of relevant webpages under a query produced by a sequence of online LTR models updated by different checkpoints, using a Query-By-Committee (QBC) method. Then, we explore a new criterion namely Prediction Variances (PV) that measures the variance of prediction results for all relevant webpages under a query. Our empirical studies find that RE may favor low-frequency queries from the pool for labeling while PV prioritizing high-frequency queries more. Finally, we combine these two complementary criteria as the sample selection strategies for active learning. Extensive experiments with comparisons to baseline algorithms show that the proposed approach could train LTR models achieving higher Discounted Cumulative Gain (i.e., the relative improvement ΔDCG4=1.38%) with the same budgeted labeling efforts.