对抗性在线学习随着行动集的变化：具有近似遗憾界限的有效算法

论文标题

对抗性在线学习随着行动集的变化：具有近似遗憾界限的有效算法

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

论文作者

Emamjomeh-Zadeh, Ehsan, Wei, Chen-Yu, Luo, Haipeng, Kempe, David

论文摘要

我们与沉睡的专家/土匪一起重新审视了在线学习的问题：在每个时间步骤中，只有一部分动作可供算法选择（并了解）。 Kleinberg等人的工作。（2010年）表明，存在没有递归的算法，这些算法的表现不得比渐近行动的最佳排名差。不幸的是，实现这种遗憾的束缚似乎在计算上很难：Kanade和Steinke（2014）表明，实现这种无重格表现至少与PAC学习DNF一样难，这是一个众所周知的困难问题。在目前的工作中，我们放宽了原始问题，并研究了计算有效的无毒性regret算法：除了增加添加性遗憾之外，此类算法还可以超过乘法常数的最佳成本。我们提供了一种算法，可为普通睡眠专家/强盗问题提供无敏感性的保证。对于该问题的几种规范特殊情况，我们给出具有更好近似比的算法。这些算法还说明了实现无敏化regret保证的不同技术。

We revisit the problem of online learning with sleeping experts/bandits: in each time step, only a subset of the actions are available for the algorithm to choose from (and learn about). The work of Kleinberg et al. (2010) showed that there exist no-regret algorithms which perform no worse than the best ranking of actions asymptotically. Unfortunately, achieving this regret bound appears computationally hard: Kanade and Steinke (2014) showed that achieving this no-regret performance is at least as hard as PAC-learning DNFs, a notoriously difficult problem. In the present work, we relax the original problem and study computationally efficient no-approximate-regret algorithms: such algorithms may exceed the optimal cost by a multiplicative constant in addition to the additive regret. We give an algorithm that provides a no-approximate-regret guarantee for the general sleeping expert/bandit problems. For several canonical special cases of the problem, we give algorithms with significantly better approximation ratios; these algorithms also illustrate different techniques for achieving no-approximate-regret guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题