论文标题
对抗性在线学习随着行动集的变化:具有近似遗憾界限的有效算法
Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
论文作者
论文摘要
我们与沉睡的专家/土匪一起重新审视了在线学习的问题:在每个时间步骤中,只有一部分动作可供算法选择(并了解)。 Kleinberg等人的工作。 (2010年)表明,存在没有递归的算法,这些算法的表现不得比渐近行动的最佳排名差。不幸的是,实现这种遗憾的束缚似乎在计算上很难:Kanade和Steinke(2014)表明,实现这种无重格表现至少与PAC学习DNF一样难,这是一个众所周知的困难问题。在目前的工作中,我们放宽了原始问题,并研究了计算有效的无毒性regret算法:除了增加添加性遗憾之外,此类算法还可以超过乘法常数的最佳成本。我们提供了一种算法,可为普通睡眠专家/强盗问题提供无敏感性的保证。对于该问题的几种规范特殊情况,我们给出具有更好近似比的算法。这些算法还说明了实现无敏化regret保证的不同技术。
We revisit the problem of online learning with sleeping experts/bandits: in each time step, only a subset of the actions are available for the algorithm to choose from (and learn about). The work of Kleinberg et al. (2010) showed that there exist no-regret algorithms which perform no worse than the best ranking of actions asymptotically. Unfortunately, achieving this regret bound appears computationally hard: Kanade and Steinke (2014) showed that achieving this no-regret performance is at least as hard as PAC-learning DNFs, a notoriously difficult problem. In the present work, we relax the original problem and study computationally efficient no-approximate-regret algorithms: such algorithms may exceed the optimal cost by a multiplicative constant in addition to the additive regret. We give an algorithm that provides a no-approximate-regret guarantee for the general sleeping expert/bandit problems. For several canonical special cases of the problem, we give algorithms with significantly better approximation ratios; these algorithms also illustrate different techniques for achieving no-approximate-regret guarantees.