了解强化学到的人群

论文标题

了解强化学到的人群

Understanding reinforcement learned crowds

论文作者

Kwiatkowski, Ariel, Kalogeiton, Vicky, Pettré, Julien, Cani, Marie-Paule

论文摘要

模拟虚拟人群的轨迹是计算机图形中通常遇到的任务。最近的一些作品应用了强化学习方法来使虚拟代理动画，但是在基本模拟设置方面，它们通常会做出不同的设计选择。这些选择中的每一个都有合理的使用理由，因此并不明显其真正的影响是什么，以及它们如何影响结果。在这项工作中，我们根据其对学习绩效的影响以及根据能源效率测得的模拟的质量分析了其中一些任意选择。我们对奖励函数设计的性质进行理论分析，并通过经验评估使用某些观察和动作空间对各种情况的影响，并将奖励函数和能量使用作为指标。我们表明，直接使用相邻代理的信息作为观察，通常优于更广泛使用的射线播放。同样，与具有绝对观察结果的自动对照相比，使用具有以自我为中心的观察的非独立对照往往会产生更有效的行为。这些选择中的每一个都对结果产生重大且潜在的非平凡影响，因此研究人员应该注意选择和报告他们的工作。

Simulating trajectories of virtual crowds is a commonly encountered task in Computer Graphics. Several recent works have applied Reinforcement Learning methods to animate virtual agents, however they often make different design choices when it comes to the fundamental simulation setup. Each of these choices comes with a reasonable justification for its use, so it is not obvious what is their real impact, and how they affect the results. In this work, we analyze some of these arbitrary choices in terms of their impact on the learning performance, as well as the quality of the resulting simulation measured in terms of the energy efficiency. We perform a theoretical analysis of the properties of the reward function design, and empirically evaluate the impact of using certain observation and action spaces on a variety of scenarios, with the reward function and energy usage as metrics. We show that directly using the neighboring agents' information as observation generally outperforms the more widely used raycasting. Similarly, using nonholonomic controls with egocentric observations tends to produce more efficient behaviors than holonomic controls with absolute observations. Each of these choices has a significant, and potentially nontrivial impact on the results, and so researchers should be mindful about choosing and reporting them in their work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题