通过多代理深入学习，动态派遣大规模异构舰队

论文标题

通过多代理深入学习，动态派遣大规模异构舰队

Dynamic Dispatching for Large-Scale Heterogeneous Fleet via Multi-agent Deep Reinforcement Learning

论文作者

Zhang, Chi, Odonkor, Philip, Zheng, Shuai, Khorasgani, Hamed, Serita, Susumu, Gupta, Chetan

论文摘要

动态调度是在传统行业（例如采矿）中进行运营优化的核心问题之一，因为它如何在正确的时间巧妙地将正确的资源分配到正确的位置。通常，该行业依靠启发式方法甚至人类直觉，这些直觉通常是短视和最佳解决方案。利用AI和物联网（IoT）的功能，数据驱动的自动化正在重塑该领域。但是，面临自己的挑战，例如在高度动态的环境中运行的大规模和异源卡车，它几乎无法采用其他领域（例如乘车共享）中开发的方法。在本文中，我们提出了一种新型的深入增强学习方法，以解决采矿中动态调度问题。我们首先开发了一个基于事件的挖掘模拟器，其参数在实际矿山中校准。然后，我们提出了一个具有新颖的抽象状态/动作表示形式的经验共享的深Q网络，以完全从异构代理中学习记忆，并以集中的方式实现学习。我们证明，在生产率方面，所提出的方法的表现大大优于该行业中最广泛采用的方法5.56 \％$。所提出的方法在更广泛的行业（例如制造业，物流）中具有很大的潜力，这些行业具有在高度动态的环境中工作的大规模的异源设备，作为动态资源分配的一般框架。

Dynamic dispatching is one of the core problems for operation optimization in traditional industries such as mining, as it is about how to smartly allocate the right resources to the right place at the right time. Conventionally, the industry relies on heuristics or even human intuitions which are often short-sighted and sub-optimal solutions. Leveraging the power of AI and Internet of Things (IoT), data-driven automation is reshaping this area. However, facing its own challenges such as large-scale and heterogenous trucks running in a highly dynamic environment, it can barely adopt methods developed in other domains (e.g., ride-sharing). In this paper, we propose a novel Deep Reinforcement Learning approach to solve the dynamic dispatching problem in mining. We first develop an event-based mining simulator with parameters calibrated in real mines. Then we propose an experience-sharing Deep Q Network with a novel abstract state/action representation to learn memories from heterogeneous agents altogether and realizes learning in a centralized way. We demonstrate that the proposed methods significantly outperform the most widely adopted approaches in the industry by $5.56\%$ in terms of productivity. The proposed approach has great potential in a broader range of industries (e.g., manufacturing, logistics) which have a large-scale of heterogenous equipment working in a highly dynamic environment, as a general framework for dynamic resource allocation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题