Corraling随机匪徒

论文标题

Corraling随机匪徒

Corralling Stochastic Bandit Algorithms

论文作者

Arora, Raman, Marinov, Teodor V., Mohri, Mehryar

论文摘要

我们研究了与随机环境设计的多种匪徒算法相结合的问题的问题，该算法的目的是设计出一种几乎和最佳基础算法的呈现型算法。我们为此环境提供了两种一般算法，我们从有利的遗憾保证中获得了好处。我们表明，科罗式算法的遗憾并不比包含最高奖励的手臂的最佳算法差，并取决于最高奖励和其他奖励之间的差距。

We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algorithm. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward, and depends on the gap between the highest reward and other rewards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题