无人驾驶网络中的能量最小化：针对受限计划优化的Actor-Critic学习

论文标题

无人驾驶网络中的能量最小化：针对受限计划优化的Actor-Critic学习

Energy Minimization in UAV-Aided Networks: Actor-Critic Learning for Constrained Scheduling Optimization

论文作者

Yuan, Yaxiong, Lei, Lei, Vu, Thang Xuan, Chatzinotas, Symeon, Sun, Sumei, Ottersten, Bjorn

论文摘要

在无人机（UAV）应用中，无人机的能源供应和存储有限触发了智能能源持稳定的调度解决方案的开发。在本文中，我们通过共同优化数据传输调度和无人机悬停时间来研究无人机通信网络的能源最小化。配制的问题是与双线性约束的组合和非凸。为了解决该问题，首先，我们提供了最佳的放松和同意解决方案，并开发出了近乎最佳的算法。提出的两种解决方案均用作离线性能基准测试，但可能不适合在线操作。为此，我们从深度加强学习（DRL）方面开发了一个解决方案。但是，常规的RL/DRL（例如，深度Q学习）在处理约束组合优化的两个主要问题方面受到限制，即指数呈指数增长的动作空间和不可行的动作。解决方案开发的新颖性在于处理这两个问题。为了解决前者，我们提出了一种基于参与者的深层随机在线调度（AC-DSO）算法，并开发了一套限制动作空间的方法。对于后者，我们设计了量身定制的奖励功能，以确保解决方案的可行性。数值结果表明，通过消耗相同的时间，AC-DSO可以提供可行的解决方案，并与常规的深入参与者 - 批判性方法相比，节省了29.94％的能量。与发达的接近最佳算法相比，AC-DSO消耗的能量高约10％，但减少了计算时间从分钟级别到毫秒级别。

In unmanned aerial vehicle (UAV) applications, the UAV's limited energy supply and storage have triggered the development of intelligent energy-conserving scheduling solutions. In this paper, we investigate energy minimization for UAV-aided communication networks by jointly optimizing data-transmission scheduling and UAV hovering time. The formulated problem is combinatorial and non-convex with bilinear constraints. To tackle the problem, firstly, we provide an optimal relax-and-approximate solution and develop a near-optimal algorithm. Both the proposed solutions are served as offline performance benchmarks but might not be suitable for online operation. To this end, we develop a solution from a deep reinforcement learning (DRL) aspect. The conventional RL/DRL, e.g., deep Q-learning, however, is limited in dealing with two main issues in constrained combinatorial optimization, i.e., exponentially increasing action space and infeasible actions. The novelty of solution development lies in handling these two issues. To address the former, we propose an actor-critic-based deep stochastic online scheduling (AC-DSOS) algorithm and develop a set of approaches to confine the action space. For the latter, we design a tailored reward function to guarantee the solution feasibility. Numerical results show that, by consuming equal magnitude of time, AC-DSOS is able to provide feasible solutions and saves 29.94% energy compared with a conventional deep actor-critic method. Compared to the developed near-optimal algorithm, AC-DSOS consumes around 10% higher energy but reduces the computational time from minute-level to millisecond-level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题