通过上下文汤普森采样

论文标题

通过上下文汤普森采样

Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling

论文作者

Thornton, Charles E., Buehrer, R. Michael, Martone, Anthony F.

论文摘要

本文介绍了一种适应性雷达传输的顺序或在线学习方案，该方案促进了与非合作性蜂窝网络共享频谱共享的。首先，对雷达和空间远处的细胞网络之间的干扰通道进行了建模。然后，应用线性上下文匪（CB）学习框架来推动雷达的行为。提议的汤普森采样（TS）算法平衡了勘探和剥削之间的基本权衡，这是一种伪巴约西亚方法，该方法基于特定波形是最佳波形是最佳的，给定折扣通道信息作为上下文的后验概率选择波形参数。结果表明，与可比的上下文强匪算法相比，上下文TS方法更快地收敛到最小化相互干扰并最大化频谱利用率的行为。此外，我们表明，与其他在线学习算法相比，TS学习方案会导致有利的SINR分布。最后，将提出的TS算法与深入的增强学习模型进行了比较。我们表明，TS算法通过更复杂的深Q网络（DQN）保持竞争性能。

This paper describes a sequential, or online, learning scheme for adaptive radar transmissions that facilitate spectrum sharing with a non-cooperative cellular network. First, the interference channel between the radar and a spatially distant cellular network is modeled. Then, a linear Contextual Bandit (CB) learning framework is applied to drive the radar's behavior. The fundamental trade-off between exploration and exploitation is balanced by a proposed Thompson Sampling (TS) algorithm, a pseudo-Bayesian approach which selects waveform parameters based on the posterior probability that a specific waveform is optimal, given discounted channel information as context. It is shown that the contextual TS approach converges more rapidly to behavior that minimizes mutual interference and maximizes spectrum utilization than comparable contextual bandit algorithms. Additionally, we show that the TS learning scheme results in a favorable SINR distribution compared to other online learning algorithms. Finally, the proposed TS algorithm is compared to a deep reinforcement learning model. We show that the TS algorithm maintains competitive performance with a more complex Deep Q-Network (DQN).

下载PDF全文

下载文献需遵守相关版权规定

论文标题