STDPG：SDN中动态路由的时空确定性策略梯度代理

论文标题

STDPG：SDN中动态路由的时空确定性策略梯度代理

STDPG: A Spatio-Temporal Deterministic Policy Gradient Agent for Dynamic Routing in SDN

论文作者

Chen, Juan, Xiao, Zhiwen, Xing, Huanlai, Dai, Penglin, Luo, Shouxi, Iqbal, Muhammad Azhar

论文摘要

软件定义网络（SDN）中的动态路由可以看作是集中的决策问题。由于深度神经网络（DNN）合并，大多数现有的深入强化学习（DRL）代理可以解决它。但是，通常采用完全连接的前馈神经网络（FFNN），在空间相关性和交通流的时间变化中。由于大量的训练参数，这种缺点通常会导致明显高的计算复杂性。为了克服这个问题，我们提出了一个新型的SDN动态路由的新型模型框架，该框架称为时空确定性策略梯度（STDPG）代理。演员和评论家网络均基于相同的DNN结构，其中设计了卷积神经网络（CNN）和长期短期记忆网络（LSTM）与时间注意机制CNN-LSTM-TAM的组合。通过有效利用空间和时间特征，CNNLSTM-TAM可帮助STDPG代理从体验过渡中学习得更好。此外，我们采用优先的经验重播（PER）方法来加速模型训练的收敛性。实验结果表明，STDPG可以自动适应当前的网络环境并实现强大的收敛。与数量最先进的DRL代理相比，STDPG在平均端到端延迟方面实现了更好的路由解决方案。

Dynamic routing in software-defined networking (SDN) can be viewed as a centralized decision-making problem. Most of the existing deep reinforcement learning (DRL) agents can address it, thanks to the deep neural network (DNN)incorporated. However, fully-connected feed-forward neural network (FFNN) is usually adopted, where spatial correlation and temporal variation of traffic flows are ignored. This drawback usually leads to significantly high computational complexity due to large number of training parameters. To overcome this problem, we propose a novel model-free framework for dynamic routing in SDN, which is referred to as spatio-temporal deterministic policy gradient (STDPG) agent. Both the actor and critic networks are based on identical DNN structure, where a combination of convolutional neural network (CNN) and long short-term memory network (LSTM) with temporal attention mechanism, CNN-LSTM-TAM, is devised. By efficiently exploiting spatial and temporal features, CNNLSTM-TAM helps the STDPG agent learn better from the experience transitions. Furthermore, we employ the prioritized experience replay (PER) method to accelerate the convergence of model training. The experimental results show that STDPG can automatically adapt for current network environment and achieve robust convergence. Compared with a number state-ofthe-art DRL agents, STDPG achieves better routing solutions in terms of the average end-to-end delay.

下载PDF全文

下载文献需遵守相关版权规定

论文标题