使用混合整数编程对整数值的神经网络进行最佳培训

论文标题

使用混合整数编程对整数值的神经网络进行最佳培训

Optimal training of integer-valued neural networks with mixed integer programming

论文作者

Thorbjarnarson, Tómas, Yorke-Smith, Neil

论文摘要

最近的工作表明，使用混合整数编程（MIP）求解器来优化神经网络（NNS）的某些方面的潜力。但是，用MIP求解器训练NNS的有趣方法尚未探索。训练NNS的最先进的方法通常基于梯度，需要大量数据，GPU计算以及广泛的超参数调整。相比之下，使用MIP求解器的培训不需要GPU或重型参数调整，但目前除了少量数据外无法处理。本文以最新的进步为基础，该进步使用MIP求解器训练NNS。我们通过制定新的MIP模型来超越当前的工作，从而提高训练效率，并可以训练重要的整数值为值的神经网络（INNS）。我们提供了两种新型方法，以进一步使用MIP训练NNS的潜在意义。第一种方法在训练时优化了NN中神经元的数量。这减少了在培训之前确定网络体系结构的需求。第二种方法解决了MIP可以处理的训练数据量：我们提供了一种批处理培训方法，可大大增加MIP求解器可以使用训练的数据量。因此，我们为使用MIP模型训练NNS时提供了更多的使用数据。关于两个现实世界中数据限制数据集的实验结果表明，就准确性，培训时间和数据量而言，我们的方法在用MIP训练NN中强烈优于先前的最新技术。当可获得最小的培训数据以及具有最小内存要求的培训时，我们的方法精通培训NNS，这对于部署到低内存设备而言可能是有价值的。

Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NNs). However the intriguing approach of training NNs with MIP solvers is under-explored. State-of-the-art-methods to train NNs are typically gradient-based and require significant data, computation on GPUs, and extensive hyper-parameter tuning. In contrast, training with MIP solvers does not require GPUs or heavy hyper-parameter tuning, but currently cannot handle anything but small amounts of data. This article builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs). We provide two novel methods to further the potential significance of using MIP to train NNs. The first method optimizes the number of neurons in the NN while training. This reduces the need for deciding on network architecture before training. The second method addresses the amount of training data which MIP can feasibly handle: we provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train. We thus provide a promising step towards using much more data than before when training NNs using MIP models. Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NN with MIP, in terms of accuracy, training time and amount of data. Our methodology is proficient at training NNs when minimal training data is available, and at training with minimal memory requirements -- which is potentially valuable for deploying to low-memory devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题