补品：用于快速原型制作和基准测试的深钢筋学习库

论文标题

补品：用于快速原型制作和基准测试的深钢筋学习库

Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking

论文作者

Pardo, Fabio

论文摘要

在过去的几年中，深度强化学习一直是机器学习最快的领域之一，并且已经开放了许多图书馆以支持研究。但是，大多数代码库具有陡峭的学习曲线或有限的灵活性，无法满足基础研究中快速原型的需求。 This paper introduces Tonic, a Python library allowing researchers to quickly implement new ideas and measure their importance by providing: 1) general-purpose configurable modules 2) several baseline agents: A2C, TRPO, PPO, MPO, DDPG, D4PG, TD3 and SAC built with these modules 3) support for TensorFlow 2 and PyTorch 4) support for continuous-control environments从Openai Gym，DeepMind Control Suite和Pybullet 5）脚本以可重现的方式进行实验，绘制结果并使用训练有素的代理6）在70个连续控制任务上提供的代理的基准。评估是在相同种子，训练和测试循环的公平条件下进行的，同时共享一般改进，例如非末端超时和观察归一化。最后，为了证明补品如何简化实验，对一种称为TD4的新型药物进行了评估。

Deep reinforcement learning has been one of the fastest growing fields of machine learning over the past years and numerous libraries have been open sourced to support research. However, most codebases have a steep learning curve or limited flexibility that do not satisfy a need for fast prototyping in fundamental research. This paper introduces Tonic, a Python library allowing researchers to quickly implement new ideas and measure their importance by providing: 1) general-purpose configurable modules 2) several baseline agents: A2C, TRPO, PPO, MPO, DDPG, D4PG, TD3 and SAC built with these modules 3) support for TensorFlow 2 and PyTorch 4) support for continuous-control environments from OpenAI Gym, DeepMind Control Suite and PyBullet 5) scripts to experiment in a reproducible way, plot results, and play with trained agents 6) a benchmark of the provided agents on 70 continuous-control tasks. Evaluation is performed in fair conditions with identical seeds, training and testing loops, while sharing general improvements such as non-terminal timeouts and observation normalization. Finally, to demonstrate how Tonic simplifies experimentation, a novel agent called TD4 is implemented and evaluated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题