论文标题

没有沟通的协调:两名玩家多臂土匪的最佳遗憾

Coordination without communication: optimal regret in two players multi-armed bandits

论文作者

Bubeck, Sébastien, Budzinski, Thomas

论文摘要

我们认为两个代理商同时扮演相同的随机三臂强盗问题。这两个特工正在合作,但他们无法交流。我们提出了一个策略,没有球员之间完全没有碰撞的策略(概率很高),并且遗憾的是$ o(\ sqrt {t \ log(t)})$。我们还认为,应通过证明问题的完整信息变体的下限来证明额外的对数项$ \ sqrt {\ log(t)} $是必需的。

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源