使用Actor-Critic钢筋学习在受控感测下的异常检测

论文标题

使用Actor-Critic钢筋学习在受控感测下的异常检测

Anomaly Detection Under Controlled Sensing Using Actor-Critic Reinforcement Learning

论文作者

Joseph, Geethu, Gursoy, M. Cenk, Varshney, Pramod K.

论文摘要

我们考虑使用其嘈杂的二元传感器测量值在给定的一组过程中检测异常的问题。与正常过程相对应的无噪声传感器测量值为0，如果该过程异常，则测量为1。假定决策算法不了解异常过程的数量。允许该算法在每次瞬间选择传感器的子集，直到决策的置信度超过所需值。我们的目标是设计一个顺序的传感器选择策略，该策略会动态确定在每个时间和何时终止检测算法的过程中要观察的过程。选择策略的设计使得异常过程以所需的置信度检测，同时产生最低成本，这包括检测的延迟和感应成本。我们将这个问题作为马尔可夫决策过程框架内的顺序假设测试问题，并使用参与者 - 批判性的深钢筋学习算法来解决它。这种基于神经网络的深层算法提供了低复杂的解决方案，具有良好的检测精度。我们还研究了过程之间统计依赖对算法性能的影响。通过数值实验，我们表明我们的算法能够适应该过程的任何未知的统计依赖模式。

We consider the problem of detecting anomalies among a given set of processes using their noisy binary sensor measurements. The noiseless sensor measurement corresponding to a normal process is 0, and the measurement is 1 if the process is anomalous. The decision-making algorithm is assumed to have no knowledge of the number of anomalous processes. The algorithm is allowed to choose a subset of the sensors at each time instant until the confidence level on the decision exceeds the desired value. Our objective is to design a sequential sensor selection policy that dynamically determines which processes to observe at each time and when to terminate the detection algorithm. The selection policy is designed such that the anomalous processes are detected with the desired confidence level while incurring minimum cost which comprises the delay in detection and the cost of sensing. We cast this problem as a sequential hypothesis testing problem within the framework of Markov decision processes, and solve it using the actor-critic deep reinforcement learning algorithm. This deep neural network-based algorithm offers a low complexity solution with good detection accuracy. We also study the effect of statistical dependence between the processes on the algorithm performance. Through numerical experiments, we show that our algorithm is able to adapt to any unknown statistical dependence pattern of the processes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题