对抗性在线控制的对数后悔

论文标题

对抗性在线控制的对数后悔

Logarithmic Regret for Adversarial Online Control

论文作者

Foster, Dylan J., Simchowitz, Max

论文摘要

我们引入了一种新算法，用于在已知系统中受到对抗性干扰的已知系统中的线性季度控制。除非在干扰过程中施加了强大的随机假设，否则此设置量表的现有后悔范围为$ \ sqrt {t} $。我们给出了第一种算法，并为任意对抗性干扰序列带来对数遗憾，前提是状态和控制成本由已知的二次函数给出。我们的算法和分析使用表征来最佳离线控制法，以将在线控制问题减少到具有近似优势功能的在线学习（延迟）在线学习。与以前的技术相比，我们的方法不需要控制迭代的运动成本，从而导致对数遗憾。

We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances. Existing regret bounds for this setting scale as $\sqrt{T}$ unless strong stochastic assumptions are imposed on the disturbance process. We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences, provided the state and control costs are given by known quadratic functions. Our algorithm and analysis use a characterization for the optimal offline control law to reduce the online control problem to (delayed) online learning with approximate advantage functions. Compared to previous techniques, our approach does not need to control movement costs for the iterates, leading to logarithmic regret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题