边缘学习：随机反馈图的在线学习

论文标题

边缘学习：随机反馈图的在线学习

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

论文作者

Esposito, Emmanuel, Fusco, Federico, van der Hoeven, Dirk, Cesa-Bianchi, Nicolò

论文摘要

反馈图的框架是用强盗或完整信息反馈对顺序决策进行的概括。在这项工作中，我们研究了一个延伸，其中有向反馈图是随机的，遵循类似于经典的Erdős-rényi模型的分布。具体而言，在每个回合中，图中的每个边缘都可以实现或未对每个边缘都有明显的概率。我们证明了几乎最佳的遗憾$ \ min \ big \ big {\ min _ {\ varepsilon} \ sqrt {（α_\ varepsilon/\ varepsilon/\ varepsilon）t}，\ \，\，\，\ min _ {\ varepsilon} {\ varepsilon} { t^{2/3} \ bigr \} $（忽略对数因素），其中$α_ {\ varepsilon} $和$δ_ {\ varepsilon} $是图形理论数量，是基于与随机反馈图$ \ nathcal aTep at edep at gress in gress cobilitions cobientials complose的图形理论数量。我们的结果在没有关于$ \ Mathcal {g} $的任何初步知识的情况下保持，要求学习者只观察到所选择的行动的未实现的境内。当允许学习者观察整个图的实现（但只有在所选动作的范围内的损失）时，我们得出了一种更有效的算法，这些算法具有依赖于独立性的加权版本和弱统治数量的依赖性，这些算法在某些特殊情况下显示出改善界限。

The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order $\min\bigl\{\min_{\varepsilon} \sqrt{(α_\varepsilon/\varepsilon) T},\, \min_{\varepsilon} (δ_\varepsilon/\varepsilon)^{1/3} T^{2/3}\bigr\}$ (ignoring logarithmic factors), where $α_{\varepsilon}$ and $δ_{\varepsilon}$ are graph-theoretic quantities measured on the support of the stochastic feedback graph $\mathcal{G}$ with edge probabilities thresholded at $\varepsilon$. Our result, which holds without any preliminary knowledge about $\mathcal{G}$, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题