论文标题
通过遗憾分解改进了带有图形反馈的强盗算法
Improved Algorithms for Bandit with Graph Feedback via Regret Decomposition
论文作者
论文摘要
带有图形反馈的强盗问题通过在有向图中编码如何在游戏的每个回合中观察到损失向量,从而概括了多臂强盗(MAB)问题和专家建议问题的学习。迷你最大遗憾与反馈图的结构密切相关,它们的联系远非完全理解。我们根据反馈图的分区为问题提供了一个新的算法框架。我们的分析揭示了图表的各个部分之间的相互作用,这是通过将遗憾分解为小部分引起的遗憾的总和,以及它们的互动造成的遗憾。结果,我们的算法可以被视为通过专家建议对MAB和学习的最佳算法的插值和概括。我们的框架统一了以前可观察到的图形和弱可观察的图的先前算法,从而在广泛的图形家族上提高了范围和最佳的遗憾界限,包括有界程度的图形和可观察到的图形以及少数损坏的手臂。
The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem by encoding in a directed graph how the loss vector can be observed in each round of the game. The mini-max regret is closely related to the structure of the feedback graph and their connection is far from being fully understood. We propose a new algorithmic framework for the problem based on a partition of the feedback graph. Our analysis reveals the interplay between various parts of the graph by decomposing the regret to the sum of the regret caused by small parts and the regret caused by their interaction. As a result, our algorithm can be viewed as an interpolation and generalization of the optimal algorithms for MAB and learning with expert advice. Our framework unifies previous algorithms for both strongly observable graphs and weakly observable graphs, resulting in improved and optimal regret bounds on a wide range of graph families including graphs of bounded degree and strongly observable graphs with a few corrupted arms.