伪随机数通过强化学习和复发性神经网络生成

论文标题

伪随机数通过强化学习和复发性神经网络生成

Pseudo Random Number Generation through Reinforcement Learning and Recurrent Neural Networks

论文作者

Pasqualini, Luca, Parton, Maurizio

论文摘要

伪随机数生成器（PRNG）是任何算法生成一系列近似随机数属性的数字序列。这些数字广泛用于中级密码学和软件应用中。测试套件用于通过检查生成序列的统计特性来评估PRNGS质量。这些序列通常是位表示的。本文提出了一种强化学习（RL）方法，以通过学习策略来解决部分可观察到的马尔可夫决策过程（MDP），从而从头开始生成PRNG，其中完整状态是生成的顺序的时期，并且每个时间步骤的观察步骤是最后一个状态的序列。我们使用长期术语内存（LSTM）体系结构在不同时间步骤的观测值之间建模时间关系，并通过提取MDP状态隐藏部分的重要特征来指责LSTM内存。我们表明，对PRNG建模具有部分可观察到的MDP和LSTM体系结构在很大程度上可以改善以前工作中引入的完全可观察到的前馈RL方法的结果。

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate PRNGs quality by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence and the observation at each time step is the last sequence of bits appended to such state. We use a Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time steps, by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP's states. We show that modeling a PRNG with a partially observable MDP and a LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题