部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

No-regret Learning in Repeated First-Price Auctions with Budget Constraints

论文作者

Ai, Rui, Wang, Chang, Li, Chenchen, Zhang, Jinshan, Huang, Wenhan, Deng, Xiaotie

论文摘要

最近，在线广告市场展示了从二价拍卖到第一价格拍卖的逐渐转变。尽管有一系列有关在线拍卖中在线招标策略的作品，但仍然开放了如何处理问题中的预算约束。在本文中，我们为有预算的买家启动了这项研究，以在重复的第一价格拍卖中学习在线招标策略。我们提出了一种基于RL的竞标算法，以针对固定竞争下的最佳非期望策略。我们的算法获得$ \ widetilde o（\ sqrt t）$ - 遗憾的是，如果投标都在每个回合结束时都显示出来。由于买方在每轮之后仅看到获胜的出价，我们的修改算法获得了$ \ widetilde o（t^{\ frac {7} {12}}}）$ - 通过从生存分析中开发的技术后悔。我们的分析扩展到更通用的场景，在该场景中，买方具有任何有限的瞬时效用功能，并以同一顺序感到遗憾。

Recently the online advertising market has exhibited a gradual shift from second-price auctions to first-price auctions. Although there has been a line of works concerning online bidding strategies in first-price auctions, it still remains open how to handle budget constraints in the problem. In the present paper, we initiate the study for a buyer with budgets to learn online bidding strategies in repeated first-price auctions. We propose an RL-based bidding algorithm against the optimal non-anticipating strategy under stationary competition. Our algorithm obtains $\widetilde O(\sqrt T)$-regret if the bids are all revealed at the end of each round. With the restriction that the buyer only sees the winning bid after each round, our modified algorithm obtains $\widetilde O(T^{\frac{7}{12}})$-regret by techniques developed from survival analysis. Our analysis extends to the more general scenario where the buyer has any bounded instantaneous utility function with regrets of the same order.

下载PDF全文

下载文献需遵守相关版权规定

论文标题