在重复的囚犯困境中合作的出现

论文标题

在重复的囚犯困境中合作的出现

On the Emergence of Cooperation in the Repeated Prisoner's Dilemma

论文作者

Schaefer, Maximilian

论文摘要

利用对$ε$ - 梅迪Q-``与单周期内存的''对Q-greedy Q-Learners（Foster and Young，1990年）的潜在功能（1990年）的潜在功能允许它预测重复囚犯的潜在参数的潜在功能。观察到的Q-学习者之间的合作率与复制器动力学的极性吸引子在严峻的触发策略下施加的动能之间的比率有关。可以通过将动能比等于临界值设置为折扣因子的函数，$ f（δ）=δ/（1-δ）$的临界值将有助于合作的参数空间与由叛逃的参数空间分开的边界可以找到，该函数乘以校正项，以说明算法的探索概率的效果。边境的梯度随着游戏参数与超平面之间的距离增加，这些梯度表征了在Grim触发下对合作的激励兼容限制。本文以神经科学的文献为基础，这表明强化学习对于理解风险环境中的人类行为有用，进一步探讨了为Q-Learners衍生的边界提供的程度，还解释了人类之间合作的出现。使用实验室实验的元数据，这些元数据分析了人类在无限重复的囚犯困境中的选择，将人类之间的合作率与在类似条件下观察到的Q-arearners之间的合作率进行了比较。观察到的人类的合作率与Q-学习者观察到的合作率之间的相关系数始终高于$ 0.8 $。还发现源自Q学习者之间的模拟的边界可以预测人之间合作的出现。

Using simulations between pairs of $ε$-greedy q-learners with one-period memory, this article demonstrates that the potential function of the stochastic replicator dynamics (Foster and Young, 1990) allows it to predict the emergence of error-proof cooperative strategies from the underlying parameters of the repeated prisoner's dilemma. The observed cooperation rates between q-learners are related to the ratio between the kinetic energy exerted by the polar attractors of the replicator dynamics under the grim trigger strategy. The frontier separating the parameter space conducive to cooperation from the parameter space dominated by defection can be found by setting the kinetic energy ratio equal to a critical value, which is a function of the discount factor, $f(δ) = δ/(1-δ)$, multiplied by a correction term to account for the effect of the algorithms' exploration probability. The gradient at the frontier increases with the distance between the game parameters and the hyperplane that characterizes the incentive compatibility constraint for cooperation under grim trigger. Building on literature from the neurosciences, which suggests that reinforcement learning is useful to understanding human behavior in risky environments, the article further explores the extent to which the frontier derived for q-learners also explains the emergence of cooperation between humans. Using metadata from laboratory experiments that analyze human choices in the infinitely repeated prisoner's dilemma, the cooperation rates between humans are compared to those observed between q-learners under similar conditions. The correlation coefficients between the cooperation rates observed for humans and those observed for q-learners are consistently above $0.8$. The frontier derived from the simulations between q-learners is also found to predict the emergence of cooperation between humans.

下载PDF全文

下载文献需遵守相关版权规定

论文标题