结合随机匪徒的上限置信界

论文标题

结合随机匪徒的上限置信界

Upper Confidence Bounds for Combining Stochastic Bandits

论文作者

Cutkosky, Ashok, Das, Abhimanyu, Purohit, Manish

论文摘要

我们提供了一种结合随机匪徒算法的简单方法。我们的方法是基于“元UCB”程序，该程序将$ n $个体匪徒算法的每一种都视为武器，以高级$ n $ n $ armed bandit问题，我们通过经典的UCB算法的变体解决了该问题。我们的最后遗憾仅取决于基本算法的遗憾，并在事后遗憾。这种方法为对抗性匪徒提供了一种简单，直观的替代策略，而无需Corral对基础算法施加的稳定条件。我们的结果匹配多种设置中的下限，我们在误指定的线性匪徒和模型选择问题上提供了算法的经验验证。

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a variant of the classic UCB algorithm. Our final regret depends only on the regret of the base algorithm with the best regret in hindsight. This approach provides an easy and intuitive alternative strategy to the CORRAL algorithm for adversarial bandits, without requiring the stability conditions imposed by CORRAL on the base algorithms. Our results match lower bounds in several settings, and we provide empirical validation of our algorithm on misspecified linear bandit and model selection problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题