论文标题
使用学习对蒸馏的有效组合优化模型
An Efficient Combinatorial Optimization Model Using Learning-to-Rank Distillation
论文作者
论文摘要
最近,深度加强学习(RL)已证明其在解决组合优化问题(COP)方面的可行性。在信息检索领域已经研究了学习到级的技术。尽管可以将几个警察提出为输入项目的优先级,但在信息检索中常见,但尚未充分探讨如何将学习对级的技术纳入COPS的深度RL中。在本文中,我们介绍了基于学习到蒸馏的COP框架,其中RL为COP获得的高性能排名策略可以将其提炼成非介绍性,简单的模型,从而实现低延迟的COP求解器。具体而言,我们采用近似排名蒸馏来通过梯度下降来渲染基于得分的排名模型。此外,我们使用有效的序列采样来以有限的延迟来提高推理性能。通过该框架,我们证明了蒸馏模型不仅可以达到相当的性能与各自的高性能RL相当,而且还提供了更快的推断速度。我们通过几个COP进行评估框架,例如基于优先级的任务调度和多维背包,从而证明了该框架在推理潜伏期和绩效方面的好处。
Recently, deep reinforcement learning (RL) has proven its feasibility in solving combinatorial optimization problems (COPs). The learning-to-rank techniques have been studied in the field of information retrieval. While several COPs can be formulated as the prioritization of input items, as is common in the information retrieval, it has not been fully explored how the learning-to-rank techniques can be incorporated into deep RL for COPs. In this paper, we present the learning-to-rank distillation-based COP framework, where a high-performance ranking policy obtained by RL for a COP can be distilled into a non-iterative, simple model, thereby achieving a low-latency COP solver. Specifically, we employ the approximated ranking distillation to render a score-based ranking model learnable via gradient descent. Furthermore, we use the efficient sequence sampling to improve the inference performance with a limited delay. With the framework, we demonstrate that a distilled model not only achieves comparable performance to its respective, high-performance RL, but also provides several times faster inferences. We evaluate the framework with several COPs such as priority-based task scheduling and multidimensional knapsack, demonstrating the benefits of the framework in terms of inference latency and performance.