诱导不同优质游戏玩法的游戏规则

论文标题

诱导不同优质游戏玩法的游戏规则

Inducing game rules from varying quality game play

论文作者

Flynn, Alastair

论文摘要

一般游戏玩法（GGP）是一个框架，在该框架中，人工智能程序需要成功玩各种游戏。它充当AI和研究动机的测试床。在运行时给予AI随机的游戏描述，然后播放。该框架包括游戏规则的存储库。归纳性一般游戏玩法（IGGP）问题挑战机器学习系统，通过观看游戏的玩法来学习这些GGP游戏规则。换句话说，IGGP是从特定游戏观察中诱导一般游戏规则的问题。归纳逻辑编程（ILP）已证明是解决此问题的一种有希望的方法，尽管已经证明这仍然是ILP系统的困难问题。 IGGP上的现有工作始终假设被观察到的游戏玩家会随机移动。这不能代表人类如何学习游戏。在不存在时通常会遇到的随机游戏情况。为了解决这一限制，我们分析了使用智能与随机游戏轨迹的效果，以及改变训练集中轨迹数量的效果。我们使用2014年GGP竞赛冠军Sancho为大量游戏生成智能游戏痕迹。然后，我们使用ILP系统，Metagol，Aleph和ILASP来从痕迹中诱导游戏规则。我们训练并测试系统的智能和随机数据组合，包括两者的混合物。我们还改变了培训数据的数量。我们的结果表明，尽管在某些实验中比其他游戏更有效地学习了一些游戏，而不是其他实验的总体趋势具有统计学意义。这项工作的含义是，如本文所述，改变培训数据的质量对学习的游戏规则的准确性有很大的影响。但是，一种解决方案对所有游戏都不起作用。

General Game Playing (GGP) is a framework in which an artificial intelligence program is required to play a variety of games successfully. It acts as a test bed for AI and motivator of research. The AI is given a random game description at runtime which it then plays. The framework includes repositories of game rules. The Inductive General Game Playing (IGGP) problem challenges machine learning systems to learn these GGP game rules by watching the game being played. In other words, IGGP is the problem of inducing general game rules from specific game observations. Inductive Logic Programming (ILP) has shown to be a promising approach to this problem though it has been demonstrated that it is still a hard problem for ILP systems. Existing work on IGGP has always assumed that the game player being observed makes random moves. This is not representative of how a human learns to play a game. With random gameplay situations that would normally be encountered when humans play are not present. To address this limitation, we analyse the effect of using intelligent versus random gameplay traces as well as the effect of varying the number of traces in the training set. We use Sancho, the 2014 GGP competition winner, to generate intelligent game traces for a large number of games. We then use the ILP systems, Metagol, Aleph and ILASP to induce game rules from the traces. We train and test the systems on combinations of intelligent and random data including a mixture of both. We also vary the volume of training data. Our results show that whilst some games were learned more effectively in some of the experiments than others no overall trend was statistically significant. The implications of this work are that varying the quality of training data as described in this paper has strong effects on the accuracy of the learned game rules; however one solution does not work for all games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题