论文标题

带有时变的反馈图的对抗性匪徒的高概率遗憾改善了

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

论文作者

Luo, Haipeng, Tong, Hanghang, Zhang, Mengxiao, Zhang, Yuheng

论文摘要

我们研究了对抗性$ k $武装的强大范围的高概率后悔界限,并在$ t $ rounds上使用时变的反馈图。对于一般可观察的图表,我们开发了一种算法,该算法可以实现最佳的遗憾$ \ wideTilde {\ Mathcal {o}}}}}(((\ sum_ {t = 1}^tα_t)^{1/2} {1/2} {1/2}+\ max_+axmag_+refferitive in [t]在$ t $。与最佳现有结果[NEU,2015]相比,它仅考虑所有节点的自宽图形,我们的结果不仅更普遍地保持,而且重要的是消除了任何$ \ text {poly}(k)$依赖性,对于诸如上下文匪徒(例如上下文匪徒)而言,这可能是较大的。此外,我们还开发了第一种算法,该算法获得了最佳的高概率遗憾,以弱观察到的图形限制,甚至可以通过删除$ \ nathcal {o}(\ sqrt {kt})$ term ter-prienced分析,从而改善了[Alon等人,2015]的最佳预期遗憾。我们的算法基于在线镜下降框架,但重要的是通过几种技术的创新组合。值得注意的是,尽管较早的作品使用乐观的偏见损失估计器来实现高概率的界限,但我们发现在没有自循环的情况下,在强烈可观察到的图表中使用悲观的节点使用一个悲观的节点。

We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^Tα_t)^{1/2}+\max_{t\in[T]}α_t)$ with high probability, where $α_t$ is the independence number of the feedback graph at round $t$. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any $\text{poly}(K)$ dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the $\mathcal{O}(\sqrt{KT})$ term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源