论文标题

多代理匪徒的闲话插入 - 渗透算法

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

论文作者

Chawla, Ronshee, Sankararaman, Abishek, Ganesh, Ayalvadi, Shakkottai, Sanjay

论文摘要

我们考虑一个由$ N $代理组成的分散的多代理多武装强盗(MAB)设置,该设置解决了相同的MAB实例,以最大程度地减少个体的累积遗憾。在我们的模型中,代理通过在任意连接的图表上通过成对八卦样式通信交换消息来合作。我们开发了两种新颖的算法,每个算法都只能从所有手臂的子集发挥作用。代理商使用通信媒介仅推荐ARM-ID(不是样品),因此更新了它们玩的武器集。我们确定,如果代理通过任何连接的成对八卦机制传达$ω(\ log(t))$ times,那么与无协作的情况相比,每个代理商的遗憾是$ n $少的订单的一个因素。此外,我们表明沟通约束仅对我们算法的遗憾产生二阶效果。然后,我们分析了遗憾的第二个阶段,以在遗憾的交流方面得出界限。最后,我们从经验上评估了我们的算法,并得出结论,这些见解是基本的,而不是我们界限的文物。我们还显示了一个下限,即使在没有任何通信约束的情况下,我们算法获得的遗憾缩放也无法改善。因此,我们的结果表明,即使是代理商之间的合作水平最小也会大大减少所有代理商的遗憾。

We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $Ω(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源