论文标题
学习在多项式logit选择下排名
Learning to Rank under Multinomial Logit Choice
论文作者
论文摘要
在网站设计中学习最佳的内容是一个重要的挑战。学习排名(LTR)框架将此问题建模为选择内容列表并观察用户决定单击的位置的顺序问题。 LTR上的大多数先前工作都假定用户孤立地考虑列表中的每个项目,并在每个列表中单击或不单击每个项目。我们向LTR框架介绍了多项式logit(MNL)选择模型,该模型捕获了考虑整个项目的有序列表,并在所有项目和无单击选项之间做出单个选择的用户的行为。在MNL模型下,用户有利于本质上更具吸引力的项目,或者将其放置在列表中的优选位置。我们提出上限置信度结合(UCB)算法,以最大程度地减少在两个设置中的遗憾 - 位置依赖参数已知且未知。我们提供理论分析,导致$ω(\ sqrt {jt})$下限问题,一个$ \ tilde {o}(\ sqrt {jt})$上限,因为已知的参数设置中的UCB算法对UCB算法感到遗憾,并在$ \ tilde设置中以及$ \ tilde $ \ tilde {遗憾的是,第一个在更具挑战性的未知位置参数设置中。我们的分析基于几何随机变量的紧密浓度结果,以及根据离散数据计算的最大似然估计量的新型功能不平等。
Learning the optimal ordering of content is an important challenge in website design. The learning to rank (LTR) framework models this problem as a sequential problem of selecting lists of content and observing where users decide to click. Most previous work on LTR assumes that the user considers each item in the list in isolation, and makes binary choices to click or not on each. We introduce a multinomial logit (MNL) choice model to the LTR framework, which captures the behaviour of users who consider the ordered list of items as a whole and make a single choice among all the items and a no-click option. Under the MNL model, the user favours items which are either inherently more attractive, or placed in a preferable position within the list. We propose upper confidence bound (UCB) algorithms to minimise regret in two settings - where the position dependent parameters are known, and unknown. We present theoretical analysis leading to an $Ω(\sqrt{JT})$ lower bound for the problem, an $\tilde{O}(\sqrt{JT})$ upper bound on regret of the UCB algorithm in the known-parameter setting, and an $\tilde{O}(K^2\sqrt{JT})$ upper bound on regret, the first, in the more challenging unknown-position-parameter setting. Our analyses are based on tight new concentration results for Geometric random variables, and novel functional inequalities for maximum likelihood estimators computed on discrete data.