论文标题
将AOI最小化,以资源约束的多源继电系统:基于动态和学习的调度
Minimizing the AoI in Resource-Constrained Multi-Source Relaying Systems: Dynamic and Learning-based Scheduling
论文作者
论文摘要
我们考虑一个多源继电器系统,其中独立来源随机生成状态更新数据包,这些数据包通过不可靠的链接借助继电器发送到目的地。我们制定了传输计划策略,以最大程度地减少受传输能力和长期平均资源限制的加权总和的平均信息年龄(AOI)。我们制定了一个随机控制优化问题,并使用约束的马尔可夫决策过程(CMDP)方法和漂移加上量的方法来解决它。 CMDP问题通过使用Lagrangian松弛方法将其转换为MDP问题来解决。我们理论上分析了MDP问题的最佳策略的结构,然后提出了一种结构感知算法,该算法返回实用的近乎最佳的策略。使用Drift-Plus-Penalty方法,我们设计了一种近乎最佳的低复杂性策略,该策略会动态执行调度决策。我们还制定了一种无模型的深钢筋学习政策,采用了Lyapunov优化理论和Duel Douel Deep Q-Network。分析了提议的政策的复杂性。提供仿真结果以评估我们的策略的绩效并验证理论结果。与基线政策相比,结果表现出多达91%的绩效提高。
We consider a multi-source relaying system where independent sources randomly generate status update packets which are sent to the destination with the aid of a relay through unreliable links. We develop transmission scheduling policies to minimize the weighted sum average age of information (AoI) subject to transmission capacity and long-run average resource constraints. We formulate a stochastic control optimization problem and solve it using a constrained Markov decision process (CMDP) approach and a drift-plus-penalty method. The CMDP problem is solved by transforming it into an MDP problem using the Lagrangian relaxation method. We theoretically analyze the structure of optimal policies for the MDP problem and subsequently propose a structure-aware algorithm that returns a practical near-optimal policy. Using the drift-plus-penalty method, we devise a near-optimal low-complexity policy that performs the scheduling decisions dynamically. We also develop a model-free deep reinforcement learning policy for which the Lyapunov optimization theory and a dueling double deep Q-network are employed. The complexities of the proposed policies are analyzed. Simulation results are provided to assess the performance of our policies and validate the theoretical results. The results show up to 91% performance improvement compared to a baseline policy.