论文标题
学习算法以最大程度地减少队列长度遗憾
Learning Algorithms for Minimizing Queue Length Regret
论文作者
论文摘要
我们考虑了一个由单个发射器/接收器对和它们可以交流的$ N $频道组成的系统。数据包随机到达发射器队列并等待成功发送给接收器。发射器一次可以尝试一次在一个通道上进行帧传输,如果一个帧在队列中,则每个帧都包含一个数据包。对于每个通道,未知的概率都是成功的。发射机的目标是快速确定最佳渠道,以最大程度地减少队列中的数据包数量超过$ t $ time插槽。为了分析系统性能,我们引入了队列长度遗憾,这是学习政策的总队列长度与知道费率的控制器之间的预期差异。设计传输策略的一种方法是应用文献中解决密切相关的随机多武器强盗问题的算法。这些政策将着重于随着时间的推移最大化成功的帧传输数量。但是,我们表明这些方法具有$ω(\ log {t})$队列长度遗憾。另一方面,我们证明存在一组基于队列长度的策略,这些策略可以获得订单最佳$ o(1)$ queue Length Realise。我们使用理论分析来设计启发式方法,这些方法在模拟中表现出色。
We consider a system consisting of a single transmitter/receiver pair and $N$ channels over which they may communicate. Packets randomly arrive to the transmitter's queue and wait to be successfully sent to the receiver. The transmitter may attempt a frame transmission on one channel at a time, where each frame includes a packet if one is in the queue. For each channel, an attempted transmission is successful with an unknown probability. The transmitter's objective is to quickly identify the best channel to minimize the number of packets in the queue over $T$ time slots. To analyze system performance, we introduce queue length regret, which is the expected difference between the total queue length of a learning policy and a controller that knows the rates, a priori. One approach to designing a transmission policy would be to apply algorithms from the literature that solve the closely-related stochastic multi-armed bandit problem. These policies would focus on maximizing the number of successful frame transmissions over time. However, we show that these methods have $Ω(\log{T})$ queue length regret. On the other hand, we show that there exists a set of queue-length based policies that can obtain order optimal $O(1)$ queue length regret. We use our theoretical analysis to devise heuristic methods that are shown to perform well in simulation.