论文标题
训练DEPTH-2 RELU网络的紧密硬度结果
Tight Hardness Results for Training Depth-2 ReLU Networks
论文作者
论文摘要
我们证明了具有RELU激活函数的训练Depth-2神经网络的几个硬度结果。这些网络仅是保留的加权总和(可能包括负系数)。我们的目标是输出一个Depth-2神经网络,该网络可最大程度地减少给定训练集的正方形损失。我们证明,对于具有单个relu的网络,这个问题已经是NP-HARD。我们还证明,即使在可实现的设置中(即,当标签与未知的DEPTH-2 RELU网络一致时,即使在可实现的设置中,也可以将加权总和的加权总和最小化(对于$ K> 1 $)。我们还能够根据所需的加性错误$ε$获得运行时间的下限。为了获得我们的下限,我们使用差距指数时间假设(GAP-ETH),以及关于近似众所周知的浓缩$κ$ subgraph问题的硬度的新假设(这些假设在证明不同的较低范围中分别使用了这些假设)。例如,我们证明,在合理的硬度假设下,找到最佳拟合relu的任何适当的学习算法都必须在$ 1/ε^2 $中及时运行。加上以前关于不当学习relu的工作(Goel等人,Colt'17),这意味着要学习RELU的正确算法和不当算法之间的第一个分离。我们还研究了适当学习一个依从重量的DEPTH-2网络的问题,并在可实现和不可知的设置中学习此类网络所需的运行时间,从而为新的(最差的)上限提供了界限。我们在运行时间上的上限本质上与我们的下限匹配$ε$的依赖性。
We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of $k$ ReLUs minimizing the squared error (for $k>1$) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error $ε$. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest $κ$-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in $1/ε^2$. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on $ε$.