MDP对机器人技术不了解

论文标题

MDP对机器人技术不了解

MDPs with Unawareness in Robotics

论文作者

Rong, Nan, Halpern, Joseph Y., Saxena, Ashutosh

论文摘要

我们使用连续的MDP和在连续时间间隔内进行的操作正式在机器人技术和自动控制中形式化决策问题。然后，我们使用更精细的离散化近似连续的MDP。这样做会导致一个系统家族，每个系统都有一个非常大的动作空间，尽管只有少数动作“有趣”。我们可以将决策者视为不知道哪些行动是“有趣的”。我们可以使用MDPU，MDPS不了解，而动作空间要小得多。如我们所示，MDPU可以用作学习机器人问题中的任务的一般框架。我们证明了在MDPU中学习近乎最佳政策的困难，以进行连续任务。我们将这些想法应用于让人形机器人自己走路学习的问题。

We formalize decision-making problems in robotics and automated control using continuous MDPs and actions that take place over continuous time intervals. We then approximate the continuous MDP using finer and finer discretizations. Doing this results in a family of systems, each of which has an extremely large action space, although only a few actions are "interesting". We can view the decision maker as being unaware of which actions are "interesting". We can model this using MDPUs, MDPs with unawareness, where the action space is much smaller. As we show, MDPUs can be used as a general framework for learning tasks in robotic problems. We prove results on the difficulty of learning a near-optimal policy in an an MDPU for a continuous task. We apply these ideas to the problem of having a humanoid robot learn on its own how to walk.

下载PDF全文

下载文献需遵守相关版权规定

论文标题