通过随机初始化的随机梯度下降训练深神经网络的总体错误分析

论文标题

通过随机初始化的随机梯度下降训练深神经网络的总体错误分析

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

论文作者

Jentzen, Arnulf, Welti, Timo

论文摘要

尽管在众多应用中基于深度学习的算法和非常广泛的相应研究兴趣的成就，但目前仍未对这种算法在某些情况下产生有用结果的原因仍然有严格的了解。对基于深度学习的算法进行彻底的数学分析似乎至关重要，对于提高我们的理解和使其实施更有效和有效。在本文中，我们在概率上强烈的意义上提供了基于深度学习的经验风险最小化的数学严格误差分析，并在二次损失函数上最小化，其中使用随机初始化的随机梯度训练了潜在的深神经网络。我们获得的收敛速度可能远非最佳，并且在维度的诅咒下遭受痛苦。但是，据我们所知，我们确定了科学文献中的第一个完整错误分析，用于基于深度学习的算法在概率上强烈的意义上，而且，对于基于深度学习的算法，科学文献中的第一个完整错误分析具有随机初始分析的随机梯度下降。

In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function in the probabilistically strong sense, where the underlying deep neural networks are trained using stochastic gradient descent with random initialisation. The convergence speed we obtain is presumably far from optimal and suffers under the curse of dimensionality. To the best of our knowledge, we establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense and, moreover, the first full error analysis in the scientific literature for a deep learning based algorithm where stochastic gradient descent with random initialisation is the employed optimisation method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题