论文标题
关于梯度下降学到的过度参数深度神经网络估计的普遍一致性
On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent
论文作者
论文摘要
考虑了从独立和分布的数据估计多元回归函数。定义了一个估计,该估计符合由大量完全连接的神经网络组成的深神经网络,这些神经网络通过梯度下降与数据并行计算。估计值的估计值是其参数的数量远大于样本量的数量。结果表明,如果网络的合适随机初始化,梯度下降的合适小步骤以及许多梯度下降步骤,这些步骤略大于梯度下降的步骤的倒数,则估计值是普遍一致的,在其预期的L2误差为零的零分布的响应分布中,该响应响应可响应可变量。
Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger than the sample size. It is shown that in case of a suitable random initialization of the network, a suitable small stepsize of the gradient descent, and a number of gradient descent steps which is slightly larger than the reciprocal of the stepsize of the gradient descent, the estimate is universally consistent in the sense that its expected L2 error converges to zero for all distributions of the data where the response variable is square integrable.