一种双Q学习方法，用于导航具有连通性约束的航空车辆

论文标题

一种双Q学习方法，用于导航具有连通性约束的航空车辆

A Double Q-Learning Approach for Navigation of Aerial Vehicles with Connectivity Constraint

论文作者

Khamidehi, Behzad, Sousa, Elvino S.

论文摘要

本文研究了航空车辆的轨迹优化问题，其任务是在一对给定的初始位置和最终位置之间飞行。目的是最大程度地减少航空车的行程时间，以确保满足航空车辆安全运行所需的通信连接约束。我们考虑了导致两种不同情况的飞机连通性约束的两个不同标准。在第一种情况下，我们假设航空车超出地面基站（GBSS）的最大连续持续时间仅限于给定的阈值。但是，在第二种情况下，我们假设限制了GBSS未覆盖航空车辆的总时间段。基于这两个约束，我们制定了两个轨迹优化问题。为了解决这些非凸问题，我们使用基于双Q学习方法的方法，该方法是一种无模型的增强学习技术，与现有算法不同，我们不需要对环境的完美知识。此外，与著名的Q学习技术相反，我们的双重Q学习算法并不遭受过度估计问题的困扰。仿真结果表明，尽管我们的算法不需要事先的环境信息，但它运行良好，并且显示出最佳性能。

This paper studies the trajectory optimization problem for an aerial vehicle with the mission of flying between a pair of given initial and final locations. The objective is to minimize the travel time of the aerial vehicle ensuring that the communication connectivity constraint required for the safe operation of the aerial vehicle is satisfied. We consider two different criteria for the connectivity constraint of the aerial vehicle which leads to two different scenarios. In the first scenario, we assume that the maximum continuous time duration that the aerial vehicle is out of the coverage of the ground base stations (GBSs) is limited to a given threshold. In the second scenario, however, we assume that the total time periods that the aerial vehicle is not covered by the GBSs is restricted. Based on these two constraints, we formulate two trajectory optimization problems. To solve these non-convex problems, we use an approach based on the double Q-learning method which is a model-free reinforcement learning technique and unlike the existing algorithms does not need perfect knowledge of the environment. Moreover, in contrast to the well-known Q-learning technique, our double Q-learning algorithm does not suffer from the over-estimation issue. Simulation results show that although our algorithm does not require prior information of the environment, it works well and shows near optimal performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题