sipomdplite-net：稀疏互动的POSG中轻巧，自我利益学习和计划

论文标题

sipomdplite-net：稀疏互动的POSG中轻巧，自我利益学习和计划

SIPOMDPLite-Net: Lightweight, Self-Interested Learning and Planning in POSGs with Sparse Interactions

论文作者

Zhang, Gengyu, Doshi, Prashant

论文摘要

这项工作介绍了Sipomdplite-Net，这是一种深层神经网络（DNN）体系结构，用于分散的，自私的代理控制，以部分可观察到的随机游戏（POSG）在代理之间稀疏相互作用。该网络学会在以交互式可观察到的Markov决策过程（I-POMDP）Lite框架建模的上下文中进行计划，并使用层次值迭代网络来模拟嵌套MDP的解决方案，I i-pomdp lite属性属于其他代理，以模拟其行为并预测其意图。我们对Sipomdplite-NET进行了培训，并进行了有关小型两种虎格网格任务的专家演示，为此，它准确地学习了基础的I-POMDP Lite模型和近乎最佳的政策，并且该政策在更大的网格和现实世界中继续表现良好。因此，sipomdplite-net显示出良好的传输功能，并为多种设置中的个人，自我利益的代理提供了更轻松的学习和计划方法。

This work introduces sIPOMDPLite-net, a deep neural network (DNN) architecture for decentralized, self-interested agent control in partially observable stochastic games (POSGs) with sparse interactions between agents. The network learns to plan in contexts modeled by the interactive partially observable Markov decision process (I-POMDP) Lite framework and uses hierarchical value iteration networks to simulate the solution of nested MDPs, which I-POMDP Lite attributes to the other agent to model its behavior and predict its intention. We train sIPOMDPLite-net with expert demonstrations on small two-agent Tiger-grid tasks, for which it accurately learns the underlying I-POMDP Lite model and near-optimal policy, and the policy continues to perform well on larger grids and real-world maps. As such, sIPOMDPLite-net shows good transfer capabilities and offers a lighter learning and planning approach for individual, self-interested agents in multiagent settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题