论文标题

零和神经符号并发随机游戏的策略合成

Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games

论文作者

Yan, Rui, Santos, Gabriel, Norman, Gethin, Parker, David, Kwiatkowska, Marta

论文摘要

人工智能的神经符号方法将神经网络与经典的象征技术结合在一起,正在增长,需要正式的方法来推理其正确性。我们提出了一种新型的建模形式主义,称为Neuro-Symbolic并发随机游戏(NS-CSGS),该游戏构成了在共享连续状态环境中相互作用的两个概率有限状态代理。每个代理都使用神经感知机制观察环境,该机制将图像之类的输入转换为符号感知,并象征性地做出决策。我们专注于具有Borel状态空间的NS-CSG类,并证明在此类模型的组件上的分段组分限制下的零和折扣累积累积奖励的价值函数的存在和可测量性。为了计算价值和综合策略,我们首次介绍实践价值迭代(VI)和策略迭代(PI)算法,以解决连续状态CSG的新子类。这些需要由代理的神经感知机制引起的环境的有限分解,并依赖于在VI或PI下关闭的价值函数和策略的有限抽象表示。首先,我们引入了一个可测量的分段恒定(B-PWC)表示值函数的表示,将最小值备份扩展到此表示形式,并提出了一种称为B-PWC VI的值迭代算法。其次,我们分别为价值功能和策略介绍了两种新颖的表示,分别为恒定的p-piceWise-linear(con-pwl)和恒定式构成恒定构成(con-pwc),并提出了通过扩展最小pi的最小pi方法,该方法是基于对鲍雷尔状态空间的交替选择的,不需要borelel状态空间,该方法不需要borelel状态空间。

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise two probabilistic finite-state agents interacting in a shared continuous-state environment. Each agent observes the environment using a neural perception mechanism, which converts inputs such as images into symbolic percepts, and makes decisions symbolically. We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards under piecewise-constant restrictions on the components of this class of models. To compute values and synthesise strategies, we present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs. These require a finite decomposition of the environment induced by the neural perception mechanisms of the agents and rely on finite abstract representations of value functions and strategies closed under VI or PI. First, we introduce a Borel measurable piecewise-constant (B-PWC) representation of value functions, extend minimax backups to this representation and propose a value iteration algorithm called B-PWC VI. Second, we introduce two novel representations for the value functions and strategies, constant-piecewise-linear (CON-PWL) and constant-piecewise-constant (CON-PWC) respectively, and propose Minimax-action-free PI by extending a recent PI method based on alternating player choices for finite state spaces to Borel state spaces, which does not require normal-form games to be solved.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源